An example of Initial Data Analysis for longitudinal studies: SHARE project - Denmark
26 June 2023
Chapter 2 SHARE data description
The Survey of Health Ageing and Retirement (SHARE) in Europe consists of data on health and socioeconomic variables of non-institutionalized individuals aged 50 and older across 28 European countries and Israel [Boersch et al]. The data sets includes about 140,000 men and women, ages 50 or older, collected in years 2004 to 2018. Waves to be analyzed are 1 to 7. Also data from waves 3 and 7 (SHARELIFE interviews) were considered in this report.
2.2 Sampling in Denmark
The study protocol describes the age-related inclusion criteria by wave. Participants in Wave 1 had to be born in 1954 or before; the study design planned full range refreshment samples in Waves 2 (birth year <=1956) and 5 (birth year <=1962), and refreshment sample of the youngest cohort only in Wave 4 (birth years 1957-60) and 6 (birth years 1963-4). The full range refreshment sampling include an over-sampling of the youngest cohorts that were not age-eligible in the previous refreshment samples to maintain the representation of younger cohorts and their aim is to compensate for the effect of panel attrition on all age cohorts.
2.3 Non-enrollment
The characteristics of the subjects that were selected for participation but that did not enter the study are not reported in the data set. Limited information is available only in the documentation published by the study (retrievable at ), while detailed description of the selected sample is not provided.
2.4 Type of questionnaire
Different types of questionnaire were used during the study. By design, the baseline questionnaire is used for the first interview, the longitudinal questionnaire for the follow-up interviews. The questionnaire used in the SHARELIFE interviews (Wave 3 and partly Wave 7) includes only a subset of the questions from the longitudinal questionnaire, and additional questions about the history of the life of the participants is collected.
2.5 Changes in data collection process
The questionnaires used in the study contain thousands of questions divided in modules. Several changes occurred in the questionnaires used during the study. The study documentation includes the description of the changes. It is important to note that some questions that might provide different type of information based on the type of questionnaire being used and that careful examination of the metadata is required. Given the large number of questions and waves, retrieving all the relevant information can be challenging. Some information were used in the analysis strategy (AS) (for example, height was used as a time fixed variable, as for most of the waves it was recorded only in the baseline interviews).
References: Börsch-Supan A, Brandt M, Hunkler C, Kneip T, Korbmacher J, Malter F, et al. Data Resource Profile: The Survey of Health, Ageing and Retirement in Europe (SHARE). Int J Epidemiol. 2013;42: 992–1001
Website: www.share-eric.eu.
SHARE release guide 7.1.1
Chapter 4 Data screening
4.1 Participation profile
4.1.1 Time frame of the study (P1)
Here we summarize the times when interviews were taken (by calendar time or Wave).
4.1.1.1 Distribution of the dates of the interviews (by wave)
The graph below shows the distribution of the dates where the interviews were carried out, stratified by Wave.
4.1.1.2 Time range for each Wave (baseline and longitudinal interviews)
The wave with most interviews was Wave 5. The distribution of the number of interviews per Wave is shown below, with the range of dates for the interviews performed in different waves.
The time lag between waves was approximately 2 years, with a slightly longer gap between Wave 1 and 2, and Wave 3 and 4. The shorter time lag was between Wave 2 and 3, and Wave 5 and 6.
| Wave | Number of interviews | Proportion | Begin (date) | End (date) |
|---|---|---|---|---|
| Wave 1 | 1596 | 0.09 | 2004-04-15 | 2004-11-15 |
| Wave 2 | 2487 | 0.13 | 2006-11-15 | 2007-08-15 |
| Wave 3 | 1979 | 0.11 | 2008-11-15 | 2009-08-15 |
| Wave 4 | 2112 | 0.11 | 2011-02-15 | 2011-08-15 |
| Wave 5 | 3919 | 0.21 | 2013-02-15 | 2013-11-15 |
| Wave 6 | 3514 | 0.19 | 2015-02-15 | 2015-11-15 |
| Wave 7 | 3025 | 0.16 | 2017-03-15 | 2017-10-15 |
We summarized the time between interviews conducted in specific Waves also at individual level (graphically and with summary statistics). NA’s indicate individuals where at least one of the two interviews was missing. Data were summarized in years.
The mean and median times between interviews conducted in consecutive Waves were similar and equal to approximately 2 years, except between Wave 1 and 2 and between Wave 3 and 4, where the lag was somehow longer. The shortest differences were observed between Wave 2 and 3, and Wave 5 and 6. The most extreme differences was observed between Wave 1 and 2 (range 2 to 3 years). The variability was highest between wave 5 and 6.
| Min. | 2.17 | 1.25 | 1.75 | 1.67 | 1.25 | 1.33 |
| 1st Qu. | 2.42 | 1.67 | 2.16 | 1.92 | 1.66 | 1.92 |
| Median | 2.50 | 1.83 | 2.25 | 2.08 | 1.84 | 2.00 |
| Mean | 2.53 | 1.83 | 2.22 | 2.07 | 1.84 | 2.01 |
| 3rd Qu. | 2.66 | 2.00 | 2.33 | 2.17 | 2.00 | 2.16 |
| Max. | 3.25 | 2.50 | 2.66 | 2.67 | 2.67 | 2.66 |
| NA’s | 4267.00 | 3477.00 | 3864.00 | 3632.00 | 2266.00 | 2563.00 |
Standard deviations of the time differences, in months.
| SD Wave 2 - Wave 1 | SD Wave 3 - Wave 2 | SD Wave 4 - Wave 3 | SD Wave 5 - Wave 4 | SD Wave 6 - Wave 5 | SD Wave 7 - Wave 6 |
|---|---|---|---|---|---|
| 2 | 2.2 | 1.7 | 1.9 | 3.3 | 2.4 |
4.1.1.3 Distribution of baseline (first) interviews and longitudinal interviews by wave/calendar time
The participants were followed up longitudinally, and refreshment samples (new participants) were drawn during the study, as planned. The table below shows that no new participants were included in Wave 3 and 7 (SHARELIFE interviews), and that the largest refreshment samples were included in Wave 2 and 5 (planned full range refreshment samples, while Wave 4 and 6 planned the refreshment sample of the youngest cohort only). Results are presented also graphically using calendar time as time metric, where it can be seen that in the waves where both types of questionnaires were used, data from longitudinal questionnaires were generally collected earlier than those from baseline questionnaires.
| Baseline | Longitudinal/SHARELIFE | |
|---|---|---|
| Wave 1 | 1596 | 0 |
| Wave 2 | 1266 | 1221 |
| Wave 3 | 0 | 1979 |
| Wave 4 | 408 | 1704 |
| Wave 5 | 1872 | 2047 |
| Wave 6 | 228 | 3286 |
| Wave 7 | 0 | 3025 |
More details about the refreshment samples and about differences between questionnaires are given in the following sections.
4.1.2 Time metric (P2)
The analysis strategy (AS) defines age as the time metric in the model. Here we describe age, while later (PE1, Other time metrics) we describe more in detail the main characteristics of two additional time metrics, waves and measurement occasions (defined as the number of waves since first available measurement +1).
4.1.2.1 Distribution of age
The inclusion criteria specified that age at first interview was at least 50. the sampling design is briefly described in the description of the data.
The distribution of the age of the participants, stratified by Wave (overall, and by baseline or longitudinal interview) is presented graphically.
The overall distribution of age across waves differed somehow, as did the distribution in the baseline and longitudinal questionnaires, due to the sampling design. The small group of participants first included in Wave 4 and 6 were, by desing, considerably younger than those included in other waves. In Wave 3 and 7, where no refreshment sample was used, it was expected that the distribution of age would be shifted and reflect the 52+ population, rather than the 50+. Overall, the distribution of age across waves and types of interviews is consistent with the expectations based on the sampling design.
The distribution of ages by wave, stratified by wave of inclusion is shown in the figure below, which presents graphically the aging of the wave cohorts.
The tables below present the summary statistics for age of the observed participants, overall and by sex.
Average age somehow increased at later waves for both sexes , a similar increase in the average age is observed also in the population (data not shown).
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Wave 1 | 50 | 56 | 62 | 64.4 | 72 | 97 |
| Wave 2 | 50 | 56 | 63 | 64.5 | 72 | 99 |
| Wave 3 | 51 | 58 | 64 | 65.8 | 73 | 97 |
| Wave 4 | 50 | 57 | 64 | 65.1 | 72 | 99 |
| Wave 5 | 50 | 57 | 64 | 65.4 | 72 | 100 |
| Wave 6 | 50 | 58 | 65 | 65.8 | 72 | 100 |
| Wave 7 | 52 | 60 | 66 | 67.2 | 73 | 101 |
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Wave 1 | 50 | 56 | 63 | 65.3 | 74 | 97 |
| Wave 2 | 50 | 56 | 63 | 65.1 | 73 | 99 |
| Wave 3 | 51 | 58 | 64 | 66.3 | 74 | 97 |
| Wave 4 | 50 | 57 | 64 | 65.6 | 73 | 99 |
| Wave 5 | 50 | 57 | 64 | 65.5 | 72 | 100 |
| Wave 6 | 50 | 58 | 65 | 65.9 | 72 | 98 |
| Wave 7 | 52 | 60 | 66 | 67.5 | 74 | 101 |
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Wave 1 | 50 | 55 | 61 | 63.4 | 70.8 | 94 |
| Wave 2 | 50 | 56 | 62 | 63.9 | 70.0 | 92 |
| Wave 3 | 51 | 58 | 64 | 65.2 | 71.0 | 94 |
| Wave 4 | 50 | 57 | 63 | 64.5 | 71.0 | 96 |
| Wave 5 | 50 | 57 | 64 | 65.3 | 72.0 | 98 |
| Wave 6 | 50 | 58 | 65 | 65.6 | 72.0 | 100 |
| Wave 7 | 52 | 60 | 66 | 66.9 | 73.0 | 98 |
4.1.3 Participants (P3)
4.1.3.1 Number of participants
Overall, 5452 unique participants were included in the data set, the number of measurements (interviews) was 18632. Denmark participated in all waves of the study.
4.1.3.2 Number of interviews for each participant
Most participants were interviewed 3 times (28%), the number of participants interviewed 1 or 2 times was very similar (16/17 %), the number of interviews ranged from 1 to 7, only 23% of subjects were interviewed 6 or 7 times; the distribution of the number of interviews is given in the table below and shown graphically.
| Number of interviews | Frequency | Proportion |
|---|---|---|
| 1 | 965 | 0.18 |
| 2 | 966 | 0.18 |
| 3 | 1508 | 0.28 |
| 4 | 527 | 0.10 |
| 5 | 307 | 0.06 |
| 6 | 685 | 0.13 |
| 7 | 494 | 0.09 |
4.1.4 Data collection (PE3)
Different types of questionnaire were used during the study. By design, the baseline questionnaire is used for the first interview, the longitudinal questionnaire for the follow-up interviews. The questionnaire used in the SHARELIFE interviews (Wave 3 and partly Wave 7) includes only a subset of the questions from the longitudinal questionnaire, and additional questions about the history of the life of the participants is collected. Meta-data can be used to compare the questions included in different waves/questionnaires.
4.1.4.1 Type of questionnaire
The baseline questionnaire is used for most of the first interviews, and the longitudinal for follow-up interviews, but some exceptions are observed. Here we define a new variable typeQuest that can be used to check the type of questionnaire that was used in the study.
| Baseline questionnaire | Longitudinal questionnaire | Sharelife | NA | Sum | |
|---|---|---|---|---|---|
| Wave 1 | 1596 | 0 | 0 | 0 | 1596 |
| Wave 2 | 1313 | 1174 | 0 | 0 | 2487 |
| Wave 3 | 0 | 0 | 1979 | 0 | 1979 |
| Wave 4 | 416 | 1696 | 0 | 0 | 2112 |
| Wave 5 | 1892 | 2027 | 0 | 0 | 3919 |
| Wave 6 | 265 | 3247 | 0 | 2 | 3514 |
| Wave 7 | 1 | 1189 | 1835 | 0 | 3025 |
| Sum | 5483 | 9333 | 3814 | 2 | 18632 |
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| Baseline questionnaire | 5450 | 17 | 3 | 4 | 5 | 4 | 0 |
| Longitudinal questionnaire | 2 | 2990 | 1128 | 1714 | 1575 | 1358 | 566 |
| Sharelife | 0 | 1203 | 2232 | 281 | 0 | 32 | 66 |
| 0 | 1 | 0 | 0 | 1 | 0 | 0 |
The baseline questionnaire was used more than once for some participants (n=33), a longitudinal questionnarie was used at first measurement for 2 participants, the questionnaire type was unknown for 2 participants. In Wave 7 baseline questionnaire was used for 1 participant (by design it should not have been used).
4.1.4.2 Changes in data collection
In the following (M4) we use the comparison of the proportion of item-missing values presented to identify variables that are not available in all waves (for confirmation of the information retrieved by meta-data, or to reveal features that might not have been identified looking at meta-data).
4.2 Missing values
4.2.1 Non-enrollment (M1)
Here the aim is to describe the non-enrolled, participants that were selected but did not participate in the study, and the reasons, if available. Detailed description of the selected sample is not provided.
The documentation published by the study reports that response rates were 63% in Wave 1/2, 80% in Wave 3, 50% in Wave 4, 60% in Wave 5, 47% in Wave 6 and 85% in Wave 7. It was reported that in Wave 1 the response rates were very similar for both sexes and across age groups.
We indirectly compare the responders to their target population in ME1, using publicly available data.
4.2.2 Drop-out (M2) and intermittent missingness (M3)
Here we describe the number and characteristics of participants who dropped out from the study during the follow-up (loss to follow-up and other possible reasons: death, withdrawal, missing by design, if applicable). We also describe participants with intermittent missingness (participants that have missing data for some of the measurements - intermittent, occasional omission - but do not drop out out of the study). The summaries are based on the participants that had at least one valid interview (unit missingness other than due to non-enrollment).
4.2.2.1 Summaries of missing interviews based on wave as time metric
The follow-up of the subjects (number of interviews by Wave and proportion), stratified by baseline Wave, is shown in the table below and graphically.
The most dramatic descrease in number or participants is observed in the second wave after inclusion. Only 40% of the participants included in Wave 1 had a valid interview in Wave 7, 50% for those included in Wave 2.
| Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 | Wave 6 | Wave 7 | |
|---|---|---|---|---|---|---|---|
| Wave 1 (n) | 1596 | 1185 | 984 | 875 | 842 | 738 | 632 |
| Wave 1 (prop) | 1.00 | 0.74 | 0.62 | 0.55 | 0.53 | 0.46 | 0.40 |
| Wave 2 (n) | 1302 | 995 | 823 | 843 | 739 | 656 | |
| Wave 2 (prop) | 1.00 | 0.76 | 0.63 | 0.65 | 0.57 | 0.50 | |
| Wave 4 (n) | 414 | 351 | 308 | 281 | |||
| Wave 4 (prop) | 1.00 | 0.85 | 0.74 | 0.68 | |||
| Wave 5 (n) | 1883 | 1472 | 1248 | ||||
| Wave 5 (prop) | 1.00 | 0.78 | 0.66 | ||||
| Wave 6 (n) | 257 | 208 | |||||
| Wave 6 (prop) | 1.00 | 0.81 |
4.2.2.2 Summary of the results about reasons for missing values drop-out
Here we describe the reason for missing values at interview level, summarizing the data by measurement occasion.
Below we show the distribution of the number of available interviews per measurement occasion, categorizing the potential measurements for each participant in each Wave in 7 categories;
- Interview: the measurement was available.
- Administrative censoring/No opportunity to measure: the measurement is not taken because the study ended (for example, participants included in wave 6 have only two possible measurement occasions)
- Death: death was reported in the exit questionnaire (in the graph the dead participants are indicated as dead also in measurement occasions that go beyond the administrative censoring).
- Out of household: not part of the household at the time of interview.
- Out of sample: excluded from the study because of prolonged missingness (participants with non-response in many successive waves are labeled as out of sample); here we define a participant out of sample at the first missing interview of the sequence that determines the exclusion from the study - the definitions is applied retrospectively.
- Definitive missingness/Missing: unit was missing in the measurement occasion, had no valid interview in later waves, but was not classified as out of sample in the study.
- Intermittent missing: participant was not interviewed in the measurement occasion but an interview at a later wave was obtained.
The vast majority of the subjects were potentially included in the study for at least three measurement occasions. For more than 40% of the subjects the study ended at the forth measurement occasion (many subjects were included in Wave 4 or 5 and therefore cannot have more than 3 valid measurement).
Some participants had intermittent missingness (less than 5% at each measurement occasion), missingness by design because participants were not eligible was very rare (out of household, <1%), while administrative censoring and deaths were common, as was the loss to follow-up due to other reasons.
| Interview | Out of household | Intermittent missing | Missing | Out of sample | Death | Administrative censoring | |
|---|---|---|---|---|---|---|---|
| M1 | |||||||
|
|
5452 | 0 | 0 | 0 | 0 | 0 | 0 |
| (prop) | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| M2 | |||||||
|
|
4211 | 6 | 274 | 592 | 207 | 162 | 0 |
| (prop) | 0.77 | 0.00 | 0.05 | 0.11 | 0.04 | 0.03 | 0.00 |
| M3 | |||||||
|
|
3363 | 14 | 289 | 788 | 335 | 409 | 254 |
| (prop) | 0.62 | 0.00 | 0.05 | 0.14 | 0.06 | 0.08 | 0.05 |
| M4 | |||||||
|
|
1999 | 11 | 146 | 364 | 380 | 561 | 1991 |
| (prop) | 0.37 | 0.00 | 0.03 | 0.07 | 0.07 | 0.10 | 0.37 |
| M5 | |||||||
|
|
1581 | 6 | 73 | 297 | 379 | 726 | 2390 |
| (prop) | 0.29 | 0.00 | 0.01 | 0.05 | 0.07 | 0.13 | 0.44 |
| M6 | |||||||
|
|
1394 | 9 | 34 | 352 | 379 | 894 | 2390 |
| (prop) | 0.26 | 0.00 | 0.01 | 0.06 | 0.07 | 0.16 | 0.44 |
| M7 | |||||||
|
|
632 | 3 | 0 | 209 | 380 | 977 | 3251 |
| (prop) | 0.12 | 0.00 | 0.00 | 0.04 | 0.07 | 0.18 | 0.60 |
4.2.2.2.1 Number of missing interviews excluding deaths
The following tables explore the type of missing interviews, taking into account the number of reported deaths during follow-up, and evaluate the proportion of interviews carried out excluding the subjects that died during follow-up. We evaluate the number of deaths by wave of inclusion and the proportion of participants that survived through waves.
As expected, very few subjects died at the beginning of the follow-up, most of the deaths involve individuals first included in the first two waves (in Wave 7 only 37% of the individuals included in Wave 1 were reported to be still alive, and 68% among those included in Wave 2, while almost all the individuals included in Wave 4 or later were still alive in Wave 7)
| Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 | Wave 6 | Wave 7 | Sum | |
|---|---|---|---|---|---|---|---|---|
| Wave 1 (n) | 0 | 66 | 97 | 89 | 112 | 92 | 84 | 540 |
| Wave 1 (prop) | 0.00 | 0.06 | 0.10 | 0.10 | 0.13 | 0.12 | 0.13 | 0.34 |
| Wave 2 (n) | 0 | 36 | 57 | 58 | 53 | 76 | 280 | |
| Wave 2 (prop) | 0.00 | 0.04 | 0.07 | 0.07 | 0.07 | 0.12 | 0.22 | |
| Wave 4 (n) | 0 | 3 | 3 | 5 | 11 | |||
| Wave 4 (prop) | 0.00 | 0.01 | 0.01 | 0.02 | 0.03 | |||
| Wave 5 (n) | 0 | 54 | 90 | 144 | ||||
| Wave 5 (prop) | 0.00 | 0.04 | 0.07 | 0.08 | ||||
| Wave 6 (n) | 0 | 3 | 3 | |||||
| Wave 6 (prop) | 0.00 | 0.01 | 0.01 |
About 20% of the participants have missing value at the second measurement occasion; in later measurement occasions the number of missing data does not increase as dramatically. It is interesting to note that, taking the reported deaths into account, the number of missing values decreases in later interviews (Wave 1, measurement occasions 6 and 7).
In this section we evaluate in more detail the association between missingness and measured characteristics of the participants. The characteristics are compared using descriptive statistics, and baseline characteristics are compared among groups of participants (with complete response, lost to follow-up, with intermittent missingness, that die during the study).
4.2.2.3 Descriptive statistics comparing the baseline characteristics by type of missingness
Participants were categorized in those with complete information, intermittent missingness (at least one missing interview followed by at least one valid interview), lost to follow up (only missing interviews from a certain point on or defined as out of sample), not part of the household, and with reported death during study and compared by their baseline characteristics. See the definitions in Section 2, types of missing values.
| Baseline characteristics by type of missingness. | ||||||
| N |
Complete N=2681 |
Death N=978 |
Intermittent missing N=476 |
Lost to follow up N=1296 |
Out of household N=21 |
|
|---|---|---|---|---|---|---|
| gender : Female | 5452 | 0.54 1440/2681 | 0.51 494/ 978 | 0.50 240/ 476 | 0.53 687/1296 | 0.38 8/ 21 |
| age_int | 5452 | 52.00 58.00 66.00 60.28 ± 8.79 |
66.00 75.00 81.00 73.29 ± 10.44 |
52.00 58.00 64.00 59.55 ± 8.18 |
53.00 58.00 66.00 60.20 ± 8.57 |
51.00 54.00 59.00 55.95 ± 6.41 |
| age_int_cat : 50-59 | 5452 | 0.54 1452/2681 | 0.12 120/ 978 | 0.59 282/ 476 | 0.54 705/1296 | 0.81 17/ 21 |
| 60-69 | 0.29 780/2681 | 0.21 202/ 978 | 0.27 127/ 476 | 0.30 390/1296 | 0.14 3/ 21 | |
| 70-80 | 0.14 384/2681 | 0.41 399/ 978 | 0.13 62/ 476 | 0.13 166/1296 | 0.05 1/ 21 | |
| 80+ | 0.02 65/2681 | 0.26 257/ 978 | 0.01 5/ 476 | 0.03 35/1296 | 0.00 0/ 21 | |
| weight | 5361 | 66.0 76.0 86.0 77.2 ± 15.2 |
62.5 71.0 81.0 72.7 ± 15.0 |
65.0 76.0 85.0 77.1 ± 15.6 |
66.0 75.0 86.0 76.9 ± 15.0 |
68.0 78.0 90.0 78.6 ± 14.5 |
| height_imp | 5418 | 165.00 172.00 178.00 171.82 ± 9.04 |
163.00 169.00 175.00 169.34 ± 8.80 |
165.00 172.00 178.00 171.66 ± 8.98 |
165.00 172.00 178.00 172.01 ± 9.40 |
165.00 173.00 185.00 174.67 ± 10.26 |
| education_imp : Low | 5428 | 0.17 447/2678 | 0.38 371/ 969 | 0.19 90/ 472 | 0.22 282/1288 | 0.05 1/ 21 |
| Medium | 0.38 1019/2678 | 0.39 375/ 969 | 0.41 195/ 472 | 0.41 531/1288 | 0.48 10/ 21 | |
| High | 0.45 1212/2678 | 0.23 223/ 969 | 0.40 187/ 472 | 0.37 475/1288 | 0.48 10/ 21 | |
| pa_vig_freq | 5423 | 0.67 1798/2677 | 0.35 339/ 965 | 0.66 311/ 473 | 0.63 810/1287 | 0.76 16/ 21 |
| pa_low_freq | 5422 | 0.94 2512/2677 | 0.73 707/ 964 | 0.95 447/ 473 | 0.93 1200/1287 | 0.95 20/ 21 |
| cusmoke_imp : Yes | 5423 | 0.22 590/2679 | 0.34 327/ 963 | 0.27 126/ 472 | 0.27 343/1288 | 0.43 9/ 21 |
| maxgrip | 5272 | 29.0 36.0 48.0 38.5 ± 12.5 |
21.5 29.0 38.0 30.3 ± 11.9 |
28.0 37.5 49.0 38.5 ± 13.1 |
28.0 36.0 48.0 38.3 ± 12.9 |
35.0 50.0 54.0 43.9 ± 13.1 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||||||
Deaths were more commonly observed among men, participants that were older and had lower education and that reported less physical activity and more smoking, and considerably lower levels of grip strength . Respondents and non responders for reasons other than death were similar in their baseline charactheristics, other than for education (higher among complete responders); complete responders smoked less frequently than non-responders. Participants with intermittent missingness were slightly younger than others. The characteristics of the small group of participants out of sample (household) indicated that these was a younger group.
Similar results were obtained also when the analysis was conducted within each wave or when the groups were compared using their missing status at second measurement occasion (data not shown).
4.2.2.4 Deaths: additional details
4.2.2.4.1 Quality of reporting of deaths
Overall, the quality of reporting deaths in data from Denmark was very good. As shown below, the vital status was unknown for few participants and the deaths were reported timely.
The table below shows the number of participants stratified by dead/alive status last available information (Wave 7), as reported in the coverscreen data.
| n | % | n | % | n | % |
|---|---|---|---|---|---|
| 53 | 1 | 4421 | 81.1 | 978 | 17.9 |
In Denmark the percentage of participants with unknow vital status at the end of the study was only 1% (data from Denmark are linked with the population registry).
Quality checks on the reported deaths
Here we assess the reporting of death in the dataset.
Overall, 978 participants were reported as dead by the 7th Wave, the date of death was reported for 1085 participants. However, the two groups were not completely overalapping. Some participants were categorized as dead but their date of death was missing (n=21). The group with reported date of death but reported as alive in Wave 7 (n=128) consistently had date of deaths in 2017 or later, indicating that they were still alive when Wave 7 was conducted.
The distribution of the year of death and the Wave where the death was reported in the coverscreen (as retrieved from the cover screen data and described in data cleaning section) is given below.
Most of the attributions are consistent (year of death and first Wave with reported death).
| Wave 2 | Wave 3 | Wave 4 | Wave 5 | Wave 6 | Wave 7 | NA | |
|---|---|---|---|---|---|---|---|
| 2004 | 8 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2005 | 34 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2006 | 19 | 18 | 0 | 0 | 0 | 0 | 0 |
| 2007 | 5 | 44 | 0 | 0 | 0 | 0 | 0 |
| 2008 | 0 | 57 | 4 | 0 | 0 | 0 | 0 |
| 2009 | 0 | 8 | 64 | 0 | 0 | 0 | 0 |
| 2010 | 0 | 0 | 60 | 0 | 0 | 0 | 0 |
| 2011 | 0 | 0 | 13 | 46 | 0 | 0 | 0 |
| 2012 | 0 | 0 | 0 | 79 | 0 | 0 | 0 |
| 2013 | 0 | 0 | 0 | 42 | 18 | 0 | 0 |
| 2014 | 0 | 0 | 0 | 0 | 113 | 0 | 0 |
| 2015 | 0 | 0 | 0 | 0 | 68 | 31 | 0 |
| 2016 | 0 | 0 | 0 | 0 | 0 | 102 | 0 |
| 2017 | 0 | 0 | 0 | 0 | 0 | 80 | 29 |
| 2018 | 0 | 0 | 0 | 0 | 0 | 44 | 58 |
| 2019 | 0 | 0 | 0 | 0 | 0 | 0 | 41 |
| 0 | 6 | 5 | 6 | 3 | 1 | 4346 |
4.2.2.5 Out of sample: additional details
We explored the characteristics of the participants that were categorized as Out of sample.
Overall, 382 participants were categorized as out of sample at some point during the study, 366 of which in Wave 7.
Here we display the the combinations with at least two participants (covering all but 9 out of sample participants).
M1 M2 M3 M4 M5 M6 M7 freq
2 1 -10 -10 -10 -10 -1000 NA 102
3 1 -10 -10 -10 -10 -10 -1000 95
4 1 1 -10 -10 -10 -10 -1000 68
5 1 1 -10 -10 -10 -1000 NA 57
6 1 1 1 -10 -10 -10 -1000 33
7 1 -10 1 -10 -10 -10 -1000 10
8 1 -1000 -1000 -1000 -1000 -1000 -1000 2
9 1 -1000 -1000 NA NA NA NA 2
10 1 -10 -1001 -1000 -1000 -1000 -1000 2
11 1 1 -1000 -1000 NA NA NA 2
We can observe that the vast majority of participants with interview Out of sample (code -1000) are excluded from the study after 3, 4 or 5 missing interviews (code -10). This might indicate that in Denmark some rules that would exclude participants with long non-response are used to exclude participants from the study and that it is appropriate to interpret them as participants lost to follow-up.
The detailed exploration of metadata confirmed this finding (from Wave 7, the participants that did not participate for 3 consecutive interviews, or for which the end-of-life interview was not completed in two waves, were categorized as out of sample).
4.2.2.6 Out of household: additional details
Overall, only 34 participants were categorized as out of household at some point during the study, the numbers were rather uniform across waves.
M1 M2 M3 M4 M5 M6 M7 freq
2 1 -10 -1001 NA NA NA NA 3
3 1 -10 -1001 -1000 -1000 -1000 -1000 2
4 1 1 -1001 NA NA NA NA 2
5 1 1 1 1 1 -1001 NA 2
6 1 -1001 -1001 -1001 -1001 -1000 NA 1
7 1 -1001 -1000 -1000 -1000 -100 NA 1
8 1 -1001 -100 -100 -100 -100 -100 1
9 1 -1001 -10 -1000 -100 -100 -100 1
10 1 -1001 1 1 1 1 1 1
11 1 -1001 1 NA NA NA NA 1
12 1 -10 -1001 -1001 -1001 -1001 NA 1
13 1 -10 -1001 -1001 -10 -100 NA 1
14 1 -10 -10 -1001 1 1 -100 1
15 1 -10 -10 -1001 1 1 1 1
16 1 -10 -10 -10 -1001 -1001 NA 1
17 1 -10 -10 1 1 1 -1001 1
18 1 1 -1001 -1001 -1001 1 1 1
19 1 1 -1001 -1001 1 1 1 1
20 1 1 -1001 -1001 NA NA NA 1
21 1 1 -1001 1 1 1 1 1
22 1 1 -10 -1001 -1001 -1001 NA 1
23 1 1 -10 -1001 -10 -10 -10 1
24 1 1 -10 -10 -10 -1001 1 1
25 1 1 -10 1 1 -1001 NA 1
26 1 1 1 -1001 -10 -10 -100 1
27 1 1 1 1 -1001 1 NA 1
28 1 1 1 1 1 -1001 -1001 1
29 1 1 1 1 1 -1001 -100 1
30 1 1 1 1 1 1 -1001 1
We can observe that most of the combinations appear only once. In some cases participants are categorized as out of sample after having been out of the household. In few cases the participants that were out of the household re-enter the study (are interviewed again or appear as having missing interviews). Given the very small number of participants in this group, their further study does not seem of interest. In the statistical analyses these observations should be treated as missing by design.
4.2.3 Variable missingness (item missingness, M4)
Here we describe the missing values for the variables included in the analysis strategy (AS) as appearing in the model addressing the primary research question. The analysis is restricted to the statistical units for which the interviews were available (unit missingness is not addressed in this section). We explore also the amount of missing outcomes among the interviews that were conducted.
4.2.3.1 Item missingness at baseline interview, overall and by sex
Number and percentage of missing values at baseline interview.
| Variable | Missing (count) | Missing (%) | Missing (count) | Missing (%) | Missing (count) | Missing (%) |
|---|---|---|---|---|---|---|
| maxgrip | 180 | 3.30 | 109 | 3.80 | 71 | 2.75 |
| weight | 91 | 1.67 | 73 | 2.54 | 18 | 0.70 |
| height_imp | 34 | 0.62 | 21 | 0.73 | 13 | 0.50 |
| pa_low_freq | 30 | 0.55 | 13 | 0.45 | 17 | 0.66 |
| pa_vig_freq | 29 | 0.53 | 12 | 0.42 | 17 | 0.66 |
| cusmoke_imp | 29 | 0.53 | 13 | 0.45 | 16 | 0.62 |
| education_imp | 24 | 0.44 | 10 | 0.35 | 14 | 0.54 |
| age_int | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 |
| gender | 0 | 0.00 |
Overall, the number of missing items at baseline is very small, the maxgrip outcome variable was the variable with most missing values (2.5%). Age and sex were not missing for any of the participants at the baseline interview. Also in longitudinal interviews age and sex were not missing for any of the participants. For this reason these variables were omitted from further summaries of missing values.
Also when stratified by sex the percentages of item missing values were low, weight was missing more frequently for women.
4.2.3.2 Item missingness at baseline interview, by age group
| Variable | Missing (count) | Missing (%) | Missing (count) | Missing (%) | Missing (count) | Missing (%) | Missing (count) | Missing (%) | Missing (count) | Missing (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| maxgrip | 180 | 3.30 | 57 | 2.21 | 35 | 2.33 | 43 | 4.25 | 45 | 12.43 |
| weight | 91 | 1.67 | 30 | 1.16 | 21 | 1.40 | 22 | 2.17 | 18 | 4.97 |
| height_imp | 34 | 0.62 | 6 | 0.23 | 4 | 0.27 | 11 | 1.09 | 13 | 3.59 |
| pa_low_freq | 30 | 0.55 | 10 | 0.39 | 5 | 0.33 | 7 | 0.69 | 8 | 2.21 |
| pa_vig_freq | 29 | 0.53 | 11 | 0.43 | 5 | 0.33 | 5 | 0.49 | 8 | 2.21 |
| cusmoke_imp | 29 | 0.53 | 10 | 0.39 | 5 | 0.33 | 6 | 0.59 | 8 | 2.21 |
| education_imp | 24 | 0.44 | 13 | 0.50 | 4 | 0.27 | 3 | 0.30 | 4 | 1.10 |
When stratified by age groups, the percentages of item missing values somehow increased with age, most notably for the outcome variable.
4.2.3.3 Item missingness at baseline interview, by Wave
The summary of missingness by wave are useful for visualizing possible heterogeneity across waves.
Baseline interviews taken in different waves do not differ substantially in terms of missing values. Note that Wave 4 and 6 had a small number of baseline interviews, therefore deviations of percentages of missing values from the other waves should not be overinterpreted.
4.2.3.4 Item missingness at baseline and longitudinal interviews, by Wave
The variable that had the most problematic behaviour as of missing values in the longitudinal interviews was current smoking. The current smoking information was not recorded in longitudinal interviews in wave 6 and 7, nor in SHARELIFE wave 3 interviews, while it was recorded in baseline iterviews in all waves. The generated variable cusmoke provided by SHARE, which should report the current smoking, was not defined in wave 6 and 7, even when the data were available (with baseline interviews).
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7
No 1091 1782 0 1590 3138 210 41
Yes 499 649 0 488 775 56 165
<NA> 6 56 1979 34 6 3248 2819
The analyst might decide to consider smoking status at baseline rather than current smoking in the statistical analysis.
The following summaries will use the variable smoking at baseline (time-fixed).
In the following summaries we consider only variables that are time-varying in the AS (education and height and smoking at baseline are excluded).
| Variable | n | % | n | % | n | % | n | % | n | % | n | % | n | % |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| weight | 26 | 1.63 | 46 | 1.85 | 1979 | 100.00 | 36 | 1.70 | 60 | 1.53 | 59 | 1.68 | 70 | 2.31 |
| pa_vig_freq | 5 | 0.31 | 57 | 2.29 | 1979 | 100.00 | 35 | 1.66 | 6 | 0.15 | 5 | 0.14 | 1836 | 60.69 |
| pa_low_freq | 4 | 0.25 | 57 | 2.29 | 1979 | 100.00 | 35 | 1.66 | 8 | 0.20 | 5 | 0.14 | 1835 | 60.66 |
| maxgrip | 66 | 4.14 | 91 | 3.66 | 70 | 3.54 | 78 | 3.69 | 142 | 3.62 | 107 | 3.04 | 149 | 4.93 |
Between waves there are big differences in terms of missing values, especially for SHARELIFE interviews (Wave 3 and partly Wave 7, where a part of the inverviews are SHARELIFE interviews), where some variables are missing by design. The outcome variable had a proportion of missing values that was roughly comparable among waves, missing more often in the last Wave.
In our study the missingness by design in the SHARELIFE interviews of the two variables about physical activity is the most problematic aspect; weight is missing by design in wave 3 but not in wave 7 SHARELIFE interviews. This characteristic indicates that complete case analysis would not be feasible if weight and physical activity are used as explanatory variables in the models.
To further explore the effect of waves and/or baseline vs longitudinal interviews, we repeated the analyses stratifying the results by type of interview (baseline, longitudinal or SHARELIFE interviews). This graph makes the missingness by design easier to understand.
Some variables are missing by design in longitudinal or SHARELIFE interviews, for example current smoking in Wave 6 and 7, or height in Wave 4 and 7. If both variables are used as time fixed variables, measured at baseline, this does not constitute a problem in our study. Other variables missing by design: physical activity variables in SHARELIFE interviews, weight in SHARELIFE interviews of Wave 3.
4.2.3.6 Item missingness by measurement occasion - removing the missing by design missingness
Here we evaluate the percentages of missing values, taking into account in which interviews the values are missing by design (and excluding them).
The proportion of participants with missing values of some time-varying variables is very small if the measurements where the variables are missing by design are not considered. Only for the outcome variable we observed that the proportion of participants with missing values in the outcome (and valid interview) increased at later measurement occasions.
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| weight | |||||||
| % NA | 1.67 | 1.49 | 2.19 | 2.15 | 1.77 | 1.79 | 1.58 |
| NA | 91 | 48 | 52 | 43 | 28 | 25 | 10 |
| n | 5452 | 3216 | 2379 | 1999 | 1581 | 1394 | 632 |
| pa_vig_freq | |||||||
| % NA | 0.53 | 1.40 | 1.33 | 1.16 | 0.13 | 0.22 | 0.00 |
| NA | 29 | 42 | 15 | 20 | 2 | 3 | 0 |
| n | 5452 | 3008 | 1131 | 1718 | 1581 | 1362 | 566 |
| pa_low_freq | |||||||
| % NA | 0.55 | 1.40 | 1.33 | 1.16 | 0.13 | 0.15 | 0.00 |
| NA | 30 | 42 | 15 | 20 | 2 | 2 | 0 |
| n | 5452 | 3008 | 1131 | 1718 | 1581 | 1362 | 566 |
| maxgrip | |||||||
| % NA | 3.30 | 2.90 | 3.93 | 4.20 | 4.24 | 5.31 | 6.96 |
| NA | 180 | 122 | 132 | 84 | 67 | 74 | 44 |
| n | 5452 | 4211 | 3363 | 1999 | 1581 | 1394 | 632 |
By sex
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| weight Males | |||||||
| % NA | 0.70 | 0.65 | 0.63 | 0.96 | 0.83 | 1.24 | 1.02 |
| NA/n | 18/2583 | 10/1528 | 7/1107 | 9/940 | 6/720 | 8/646 | 3/294 |
| weight Females | |||||||
| % NA | 2.54 | 2.25 | 3.54 | 3.21 | 2.56 | 2.27 | 2.07 |
| NA/n | 73/2869 | 38/1688 | 45/1272 | 34/1059 | 22/861 | 17/748 | 7/338 |
| pa_vig_freq Males | |||||||
| % NA | 0.66 | 1.20 | 0.57 | 1.01 | 0.28 | 0.48 | 0.00 |
| NA/n | 17/2583 | 17/1419 | 3/526 | 8/793 | 2/720 | 3/630 | 0/257 |
| pa_vig_freq Females | |||||||
| % NA | 0.42 | 1.57 | 1.98 | 1.30 | 0.00 | 0.00 | 0.00 |
| NA/n | 12/2869 | 25/1589 | 12/605 | 12/925 | 0/861 | 0/732 | 0/309 |
| pa_low_freq Males | |||||||
| % NA | 0.66 | 1.20 | 0.57 | 1.01 | 0.28 | 0.32 | 0.00 |
| NA/n | 17/2583 | 17/1419 | 3/526 | 8/793 | 2/720 | 2/630 | 0/257 |
| pa_low_freq Females | |||||||
| % NA | 0.45 | 1.57 | 1.98 | 1.30 | 0.00 | 0.00 | 0.00 |
| NA/n | 13/2869 | 25/1589 | 12/605 | 12/925 | 0/861 | 0/732 | 0/309 |
| maxgrip Males | |||||||
| % NA | 2.75 | 2.02 | 2.50 | 2.87 | 3.19 | 4.49 | 5.10 |
| NA/n | 71/2583 | 40/1983 | 39/1562 | 27/940 | 23/720 | 29/646 | 15/294 |
| maxgrip Females | |||||||
| % NA | 3.80 | 3.68 | 5.16 | 5.38 | 5.11 | 6.02 | 8.58 |
| NA/n | 109/2869 | 82/2228 | 93/1801 | 57/1059 | 44/861 | 45/748 | 29/338 |
4.2.3.8 Item missingness of outcome: additional details
We restricted the attention to item missingness of maxgrip across measurement occasions (maxgrip missing, interview performed).
Outcome missingenss was between 2.2 and 6.5% across measurement occasions. `
Note that the number of interviews across measurement occasions is not comparable, as less observations are available for later measurement occasions.
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| Number or participants | 5452.0 | 4211.0 | 3363.0 | 1999.0 | 1581.0 | 1394.0 | 632 |
| Missing | 180.0 | 122.0 | 132.0 | 84.0 | 67.0 | 74.0 | 44 |
| % Missing | 3.3 | 2.9 | 3.9 | 4.2 | 4.2 | 5.3 | 7 |
The table below gives the distribution of number of missing values in outcome by number of measurements (interviews available)
| 0 | 1 | 2 | 3 | 4 | 5 | n | |
|---|---|---|---|---|---|---|---|
| M = 1 | 877 | 88 | 0 | 0 | 0 | 0 | 965 |
| M = 2 | 883 | 64 | 19 | 0 | 0 | 0 | 966 |
| M = 3 | 1387 | 90 | 23 | 8 | 0 | 0 | 1508 |
| M = 4 | 460 | 45 | 16 | 5 | 1 | 0 | 527 |
| M = 5 | 248 | 38 | 15 | 3 | 2 | 1 | 307 |
| M = 6 | 615 | 51 | 12 | 3 | 1 | 3 | 685 |
| M = 7 | 448 | 35 | 5 | 5 | 1 | 0 | 494 |
4.2.3.8.1 Outcome missingness stratified by age and sex
Here we explore the association between age (time metric from AS) and outcome missingness. The probability of missing outcome considerably increased with age, especially for women. This is shown by using descriptive statistics of the proportion of missing outcomes by sex and age group, in the complete data set (using all observations).
| 50-59 | 60-69 | 70-79 | 80+ |
|---|---|---|---|
| Males | |||
| 1.5 | 1.9 | 3.1 | 11.4 |
| 45/2890 | 57/2989 | 63/1994 | 79/611 |
| Females | |||
| 2.4 | 2.7 | 6.2 | 13.8 |
| 77/3159 | 89/3226 | 140/2104 | 153/956 |
Similar results are obtained using data from first interview only.
| 50-59 | 60-69 | 70-79 | 80+ |
|---|---|---|---|
| Males | |||
| 1.9 | 2.3 | 3.2 | 10.9 |
| 23/1207 | 17/717 | 15/457 | 16/131 |
| Females | |||
| 2.5 | 2.3 | 5.2 | 13.5 |
| 34/1312 | 18/750 | 28/512 | 29/186 |
Also a graphical display is provided that use smoothers (method gam in the geom_smooth function) to estimate the probability of missing values by age (on baseline interview and for each separate wave). As the smoothers can produce unstable estimates, these graphs should not be over-interpreted.
4.2.3.8.2 Description of the participants with outcome missing at all (avaialble) interviews
Overall, 117 participants had all missing values in the outcome at all measurement occasions (at valid interviews), most of them (n = 88, 75 %) were measured only once.
Participants with all missing outcomes were older, were less physically active, were more commonly females, had lower education than those with some non missing outcome.
| Baseline characteristics by all missing outcome (0: all NA, 1: not all NA outcomes). | |||
| N |
0 N=5335 |
1 N=117 |
|
|---|---|---|---|
| gender : Female | 5452 | 0.52 2795/5335 | 0.63 74/ 117 |
| age_int | 5452 | 53.0 60.0 69.0 62.3 ± 10.1 |
59.0 74.0 84.0 72.6 ± 14.1 |
| age_int_cat : 50-59 | 5452 | 0.48 2546/5335 | 0.26 30/ 117 |
| 60-69 | 0.28 1482/5335 | 0.17 20/ 117 | |
| 70-80 | 0.18 981/5335 | 0.26 31/ 117 | |
| 80+ | 0.06 326/5335 | 0.31 36/ 117 | |
| weight | 5361 | 65.0 75.0 85.0 76.5 ± 15.2 |
60.0 70.0 82.0 71.7 ± 15.6 |
| height_imp | 5418 | 165.00 171.00 178.00 171.47 ± 9.12 |
164.00 168.00 172.25 168.84 ± 9.60 |
| education_imp : Low | 5428 | 0.21 1141/5320 | 0.46 50/ 108 |
| Medium | 0.39 2092/5320 | 0.35 38/ 108 | |
| High | 0.39 2087/5320 | 0.19 20/ 108 | |
| pa_vig_freq | 5423 | 0.61 3253/5320 | 0.20 21/ 103 |
| pa_low_freq | 5422 | 0.91 4842/5319 | 0.43 44/ 103 |
| cusmoke_imp : Yes | 5423 | 0.26 1370/5319 | 0.24 25/ 104 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | |||
Stratification by sex of the previous table, the results are similar to the complete analysis.
| Baseline characteristics by all missing outcome by sex (0: all NA, 1: not all NA outcomes). | |||
| N |
0 N=2795 |
1 N=74 |
|
|---|---|---|---|
| Male | |||
| age_int | 2583 | 53.00 60.00 69.00 62.11 ± 9.79 |
57.50 69.00 81.00 70.12 ± 13.76 |
| age_int_cat : 50-59 | 2583 | 0.48 1217/2540 | 0.30 13/ 43 |
| 60-69 | 0.29 725/2540 | 0.21 9/ 43 | |
| 70-80 | 0.18 464/2540 | 0.19 8/ 43 | |
| 80+ | 0.05 134/2540 | 0.30 13/ 43 | |
| weight | 2565 | 75.0 82.0 92.0 84.0 ± 13.6 |
72.8 81.5 90.5 82.7 ± 14.8 |
| height_imp | 2570 | 173.00 178.00 183.00 178.12 ± 7.03 |
167.75 174.50 184.25 176.22 ± 9.29 |
| education_imp : Low | 2569 | 0.15 385/2531 | 0.32 12/ 38 |
| Medium | 0.47 1187/2531 | 0.45 17/ 38 | |
| High | 0.38 959/2531 | 0.24 9/ 38 | |
| pa_vig_freq | 2566 | 0.64 1625/2531 | 0.20 7/ 35 |
| pa_low_freq | 2566 | 0.91 2315/2531 | 0.46 16/ 35 |
| cusmoke_imp : Yes | 2567 | 0.27 686/2531 | 0.28 10/ 36 |
| Female | |||
| age_int | 2869 | 53.0 60.0 70.0 62.5 ± 10.4 |
62.2 77.0 86.0 74.1 ± 14.2 |
| age_int_cat : 50-59 | 2869 | 0.48 1329/2795 | 0.23 17/ 74 |
| 60-69 | 0.27 757/2795 | 0.15 11/ 74 | |
| 70-80 | 0.18 517/2795 | 0.31 23/ 74 | |
| 80+ | 0.07 192/2795 | 0.31 23/ 74 | |
| weight | 2796 | 60.0 68.0 77.0 69.5 ± 13.2 |
56.0 63.0 72.0 65.1 ± 12.0 |
| height_imp | 2848 | 161.00 165.00 170.00 165.42 ± 6.09 |
162.00 165.00 170.00 164.69 ± 6.94 |
| education_imp : Low | 2859 | 0.27 756/2789 | 0.54 38/ 70 |
| Medium | 0.32 905/2789 | 0.30 21/ 70 | |
| High | 0.40 1128/2789 | 0.16 11/ 70 | |
| pa_vig_freq | 2857 | 0.58 1628/2789 | 0.21 14/ 68 |
| pa_low_freq | 2856 | 0.91 2527/2788 | 0.41 28/ 68 |
| cusmoke_imp : Yes | 2856 | 0.25 684/2788 | 0.22 15/ 68 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | |||
4.2.3.8.3 Reason for missing values in the outcome
Metadata indicate that some variables in the GS module provide information about the reason for missing data in maxgrip in Wave 1. However, the information is not complete (the variable is not recorded in Wave 7 and has a large proportion of missing values in Wave 3).
In the data from Denmark missing data in the outcome are due to being unable to take the measurement for 36% of the missing values, indicating that missing values might be related to bad physical conditions; 21% refuse to take the measurement, the reason for missingness is not known in 38% of the cases.
| n | % | n | % | |
|---|---|---|---|---|
| R agrees to take measurement | 15052 | 84 | 15 | 2 |
| R refuses to take measurement | 1 | 0 | 134 | 19 |
| R is unable to take measurement | 0 | 0 | 263 | 37 |
| Proxy-interview | 0 | 0 | 28 | 4 |
| 2876 | 16 | 263 | 37 |
For about 10% of missing outcome values the participants were unable to use one or both hands (gs002_ variable).
For most participants with missing outcome all the individual (4 measurements, 2 per hand) are missing.
4.2.4 Patterns (M5)
Here we show the co-occurrence of item missingness across variables (we set the minimum set size to be displayed to 5, smaller sets can).
4.2.4.1 Co-occurrence of item missingness at baseline
There is no common pattern of missingness between variables at baseline. Most missing values appear in only one variable. Grip strength (maxgrip) is the variable with most missing values, followed by weight.
4.2.4.2 Co-occurrence of outcome missingness across measurement occasions
There was no clear association between missingness in different measuring occasions - a relatively small proportion of subjects had co-occurrence of outcome missingness in more than one occasion.
4.2.4.3 Co-occurrence of item-missingness across measurement occasions for time-varying covariates
There is no clear pattern of co-occurrence of missing values of the time varying covariates across measurement occasions. Here we did not consider as missing the variables missing by design (weight in wave 3, PA in SHARELIFE interviews). The graphs are omitted from this report.
4.2.5 Comparison of non-enrolled and target population (ME1)
Here the aim is to understand if the non-enrolled (participants that fulfill the inclusion criteria that do not participate in the study) differ from responders and how they compare to the target population.
The characteristics of non-enrolled could be studied only indirectly, comparing the samples of responders with some known characteristics of the target population (sex, age and education composition, EUROSTAT data that available from year 2007, Wave 2 of the study), as the data on non-enrolled are not provided by the SHARE study (ME1 domain).
The age, sex and education distributions of the responders were compared to those from the target population (EUROSTAT data, available from 2007, accessed in August 2022) for each of the waves. For Wave 2 and 5 we also analyzed the random refreshment samples (excluding the oversampled younger cohort, the two subsamples can be identified using the study meta-data); the comparison with the characteristics of the target population is the most straightforward analysis for studying the characteristics of the reponders, while the analysis of the full samples of responders from Waves 2 to 7 to their target population provide a mean for assessing the characteristics of non-reponders and participants lost to follow-up. For Wave 3 and 7 the target population was considered the 52+ population.
The results of all these analyses indicated that the responders that participated to the survey at least once had substantially higher education compared to the population in the same age and sex groups, the males in the younger age groups were slightly underrepresented, as were the older women.
4.2.5.1 Non-response, using data from Wave 2 and Wave 5
Only Wave 2 and 5 provide full age samples that can be used to study the characteristics of non responders. For presentation purposes the age groups 85+ were grouped because of the small number of participants older than this age. Population data about education are mostly missing for individuals older than 85 in 2007, therefore the analyses about education are restricted to this age group.
4.2.5.1.1 Wave 2
Here we restrict the attention to the random refreshment sample from Wave 2 that responded to the interview and compare it to the target population in terms of age, sex and education.
The analysis included 1084 participants from the random sample Wave 2.
Sex
The distribution of sex in the sample and in the population is similar, a deviation can be observed in the younger age group, where males in the sample are underrepresented
Age
The older women are somehow underrepresented in the sample compared to the population
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Population females | 50 | 57 | 63 | 65.7 | 74 | 100 |
| Sample W2 females | 50 | 56 | 63 | 64.7 | 72 | 98 |
| Population males | 50 | 56 | 62 | 63.7 | 70 | 100 |
| Sample W2 males | 50 | 57 | 63 | 64.4 | 70 | 92 |
Education
The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups.
4.2.5.1.2 Wave 5
Here we restrict the attention to the random refreshment sample from Wave 5 that responded to the interview and compare it to the target population in terms of age, sex and education.
The analysis included 1629 participants from the random sample Wave 5.
Sex
The distribution of sex differs between sample and population differs more than in wave 2, males in the younger age groups are more underrepresented.
Age
The older women are somehow underrepresented in the sample compared to the population, as are the younger men.
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Population females | 50 | 57 | 65 | 66.0 | 73 | 100 |
| Sample W2 females | 50 | 58 | 64 | 65.4 | 72 | 100 |
| Population males | 50 | 56 | 63 | 64.3 | 71 | 100 |
| Sample W2 males | 50 | 58 | 66 | 66.1 | 72 | 98 |
Education
The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.
4.2.5.2 Non-response and loss to follow-up
Here we compare the observed samples (all but Wave 1) with the population values.
Wave 2, all participants
All respondents from wave 2 vs population
The analysis included 2487 participants from the random sample Wave 2.
Sex
The distribution of sex differs between sample and population, males in the younger age groups are more underrepresented.
Age
The older women are somehow underrepresented in the sample compared to the population, as are the younger men.
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Population females | 50 | 57 | 65 | 66.0 | 73 | 100 |
| Sample W2 females | 50 | 56 | 63 | 65.1 | 73 | 99 |
| Population males | 50 | 56 | 63 | 64.3 | 71 | 100 |
| Sample W2 males | 50 | 56 | 62 | 63.9 | 70 | 92 |
Education
The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.
Check here the output
Wave 3, all participants
All respondents from wave 3 vs population of 52+ (no refreshment samples in wave 3). (The labels of the younger age group indicate 50-55 but refer to 52-55).
The analysis included 1979 participants from the random sample Wave 5.
Sex
As in the other waves, males in the younger age groups are underrepresented.
Age
The older women are somehow underrepresented in the sample compared to the population, as are the younger men.
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Population females | 52 | 59 | 66 | 67.1 | 74 | 100 |
| Sample W2 females | 51 | 58 | 64 | 66.3 | 74 | 97 |
| Population males | 50 | 56 | 63 | 64.3 | 71 | 100 |
| Sample W2 males | 51 | 58 | 64 | 65.2 | 71 | 94 |
Education
The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.
Wave 4, all participants
All respondents from wave 4 vs population
The analysis included 2112 participants from the random sample Wave 4.
Sex
The distribution of sex differs between sample and population differs more than in Wave 4, males in the younger age groups are more underrepresented.
Age
The older women are somehow underrepresented in the sample compared to the population, as are the younger men.
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Population females | 50 | 57 | 65 | 66.0 | 73 | 100 |
| Sample W2 females | 50 | 57 | 64 | 65.6 | 73 | 99 |
| Population males | 50 | 56 | 63 | 64.3 | 71 | 100 |
| Sample W2 males | 50 | 57 | 63 | 64.5 | 71 | 96 |
Education
The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.
Wave 5, all participants
All respondents from Wave5 vs population
The analysis included 3919 participants from the random sample Wave 5.
Sex
The distribution of sex differs between sample and population differs more than in Wave 5, males in the younger age groups are more underrepresented.
Age
The older women are somehow underrepresented in the sample compared to the population, as are the younger men.
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Population females | 50 | 57 | 65 | 66.0 | 73 | 100 |
| Sample W2 females | 50 | 57 | 64 | 65.5 | 72 | 100 |
| Population males | 50 | 56 | 63 | 64.3 | 71 | 100 |
| Sample W2 males | 50 | 57 | 64 | 65.3 | 72 | 98 |
Education
The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.
Wave 6, all participants
All respondents from Wave 6 vs population
The analysis included 3514 participants from the random sample Wave 6.
Sex
The distribution of sex differs between sample and population differs more than in Wave 6, males in the younger age groups are more underrepresented.
Age
The older women are somehow underrepresented in the sample compared to the population, as are the younger men.
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Population females | 50 | 57 | 65 | 66.0 | 73 | 100 |
| Sample W2 females | 50 | 58 | 65 | 65.9 | 72 | 98 |
| Population males | 50 | 56 | 64 | 64.5 | 71 | 100 |
| Sample W2 males | 50 | 58 | 65 | 65.6 | 72 | 100 |
Education
The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.
Wave 7, all participants
All respondents from Wave 7 vs population of 52+ (no refreshment samples in wave 7). (The labels of the younger age group indicate 50-55 but refer to 52-55).
The analysis included 3025 participants from the random sample Wave 5.
Sex
As in the other waves, males in the younger age groups are underrepresented.
Age
The older women are somehow underrepresented in the sample compared to the population, as are the younger men.
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| Population females | 52 | 59 | 66 | 67.1 | 74 | 100 |
| Sample W2 females | 52 | 60 | 66 | 67.5 | 74 | 101 |
| Population males | 50 | 56 | 63 | 64.3 | 71 | 100 |
| Sample W2 males | 52 | 60 | 66 | 66.9 | 73 | 98 |
Education
The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.
4.2.6 Probability of loss to follow-up and death (ME2)
For the analysis purposes, the participants of some of the groups would be classified as lost to follow-up (out of sample, definitive missingness, out of household if not re-included in the analysis later). Using this definition we estimate the probability of loss o follow-up, death and death after loss to follow-up. We estimated the cumulative incidence functions using Aalen-Johansen estimators for loss to follow-up and deaths (defining death times/events only for those that are not lost to follow-up - as if LOF was an absorbing state), and used Kaplan Meier estimator to estimate the probability of death after loss to follow-up (time of entry=the time of LTF, time of end=death, time of censoring = the end of the study for those who are not dead). The estimates were stratified by sex only, or by sex and age group.
Overall, the estimated probability of loss to follow-up increases most notably at the second interview (about 20% 2 years after the first interview), and it increased up to 40% by the end of the study. The estimated probability of death by the end of the study was about 20% prior to drop-out, and about 35% after post drop-out, somehow larger for males.
The probability of loss to follow-up was virtually the same across age and sex. In contrast, the probability of death prior and post dropout substantially increased with age as expected, and tended to be higher for men at younger ages.
4.2.7 Dropout effect on outcome (ME3)
4.2.7.1 Mean profiles of outcome by time of death
The graphs below show the average grip strength for groups of participants stratified by the measurement occasion of death, participants with complete data (7 observations, category named still in the cohort) are also displayed for comparison. The analyses were stratified by sex and age group. The 70-80 and the 80+ age groups were merged due to the small number of participants that entered the study at an age older than 80.
Participants that die during the study have, from inclusion, lower values of grip strength compared to others, especially among men.
4.2.7.2 Mean profiles of outcome by time to loss to follow-up
A similar analysis was also performed stratifying the participants by the measurement occasion of last available interview, if later interviews were missing (even though it is possible that participants will participate again future waves, as they have not all been excluded from the study and intermittent missingness is possible). Participants that died during the study are excluded from the graph. Participants in the category Complete include those with complete information (7 available measurements).
The difference in mean outcome between complete and incomplete cases due to definitive missingness is smaller compared to what was observed for death and specific trends are not observed.
4.3 Univariate descriptions
4.3.1 Description of variables at baseline (U1)
Here we describe the distribution of the outcome and of the explanatory variables at baseline.
The overall summary of all the variables from AS at baseline (categorical and numerical) is given in the table below. We report the distribution of the physical activity variables using four and two levels only in this summary. Later only the binary variables that will be used in modelling are summarized.
| Overall characteristics at baseline. | ||
| N |
N=5452 |
|
|---|---|---|
| gender : Female | 5452 | 0.53 2869/5452 |
| age_int | 5452 | 53.0 60.0 70.0 62.5 ± 10.3 |
| age_int_cat : 50-59 | 5452 | 0.47 2576/5452 |
| 60-69 | 0.28 1502/5452 | |
| 70-80 | 0.19 1012/5452 | |
| 80+ | 0.07 362/5452 | |
| weight | 5361 | 65.0 75.0 85.0 76.4 ± 15.2 |
| height_imp | 5418 | 165.00 171.00 178.00 171.42 ± 9.13 |
| education_imp : Low | 5428 | 0.22 1191/5428 |
| Medium | 0.39 2130/5428 | |
| High | 0.39 2107/5428 | |
| pa_vig : More than once a week | 5423 | 0.46 2519/5423 |
| Once a week | 0.14 755/5423 | |
| One to three times a month | 0.07 368/5423 | |
| Hardly ever, or never | 0.33 1781/5423 | |
| pa_vig_freq | 5423 | 0.6 3274/5423 |
| pa_low : More than once a week | 5422 | 0.81 4400/5422 |
| Once a week | 0.09 486/5422 | |
| One to three times a month | 0.03 172/5422 | |
| Hardly ever, or never | 0.07 364/5422 | |
| pa_low_freq | 5422 | 0.9 4886/5422 |
| cusmoke_imp : Yes | 5423 | 0.26 1395/5423 |
| maxgrip | 5272 | 28.0 35.0 47.0 37.1 ± 12.9 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||
At baseline interview most participants were in the younger age groups, the vast majority reported low-intensity physical activity at least once a week, 63% vigorous physical activity at least once a week. About a quarter were smokers, the most common education level was high and there were slighly more women than men.
The distribution of the numerical variables is reported also graphically (the graphical display of categorical variables is omitted).
4.3.1.1 Graphical display for numerical variables at baseline
4.3.1.1.2 Weight
The variable was reported with digit preference (values ending with 0 and 5 were more frequent than expected)
4.3.1.1.3 Height
The variable was reported with digit preference (values ending with 0 and 5 were more frequent than expected)
4.3.1.1.4 Grip strength
The variable was reported with digit preference (values ending with 0 and 5 were more frequent than expected); the distribution is bimodal, reflecting the large difference in the location of the distribution for men and women.
The characteristics observed at baseline were observed also at following measurement occasions.
4.3.1.1.5 Further exploration of digit preference for grip strength
We present the grip strength data with barplots, where the bars of the values with numbers ending with 0 or 5 are plotted in red. Here we display all the available measurements (data by wave were displayed previously).
All the peaks that deviate from the expected shape of the distribution are associated to values that end with 0 or 5.
4.3.2 Description of the time varying variables at later times (U2)
Here we summarize the longitudinal data of outcome and time-varying independent variables, stratifying the summary statistics by wave. Note that as wave is the time metric of the data collection process, the summaries stratified by wave can be used for the identification of data collection problems. The longitudinal trends of the time varying variables are summarized later (L2 for the outcome and L4 for the time-varying variables).
The digit preference was observed in all waves for weight and the outcome; the proportions did not vary greatly for categorical variables. The changes for age were described in previous sections.
| Overall baseline characteristics across waves. | ||||||||
| N |
Wave 1 N=1596 |
Wave 2 N=2487 |
Wave 3 N=1979 |
Wave 4 N=2112 |
Wave 5 N=3919 |
Wave 6 N=3514 |
Wave 7 N=3025 |
|
|---|---|---|---|---|---|---|---|---|
| gender : Female | 18632 | 0.53 850/1596 | 0.53 1330/2487 | 0.54 1069/1979 | 0.53 1122/2112 | 0.53 2071/3919 | 0.53 1858/3514 | 0.53 1604/3025 |
| age_int | 18632 | 56.00 62.00 72.00 64.40 ± 10.58 |
56.00 63.00 72.00 64.53 ± 10.30 |
58.00 64.00 73.00 65.79 ± 9.93 |
57.00 64.00 72.00 65.11 ± 10.53 |
57.00 64.00 72.00 65.39 ± 10.08 |
58.00 65.00 72.00 65.77 ± 10.03 |
60.00 66.00 73.00 67.23 ± 9.52 |
| age_int_cat : 50-59 | 18632 | 0.40 644/1596 | 0.38 946/2487 | 0.32 641/1979 | 0.36 754/2112 | 0.34 1317/3919 | 0.32 1121/3514 | 0.25 748/3025 |
| 60-69 | 0.29 455/1596 | 0.31 783/2487 | 0.35 687/1979 | 0.33 707/2112 | 0.35 1376/3919 | 0.35 1231/3514 | 0.37 1122/3025 | |
| 70-80 | 0.22 353/1596 | 0.22 538/2487 | 0.23 448/1979 | 0.21 438/2112 | 0.22 859/3919 | 0.23 820/3514 | 0.28 845/3025 | |
| 80+ | 0.09 144/1596 | 0.09 220/2487 | 0.10 203/1979 | 0.10 213/2112 | 0.09 367/3919 | 0.10 342/3514 | 0.10 310/3025 | |
| weight | 16356 | 65.0 74.0 84.0 74.7 ± 14.6 |
65.0 75.0 85.0 75.5 ± 14.7 |
65.0 75.0 85.0 76.4 ± 15.3 |
65.0 75.0 85.0 76.6 ± 15.4 |
65.0 76.0 86.5 77.2 ± 15.8 |
66.0 76.0 87.0 77.8 ± 15.9 |
|
| pa_vig_freq | 14709 | 0.60 956/1591 | 0.50 1226/2430 | 0.53 1097/2077 | 0.61 2405/3913 | 0.62 2178/3509 | 0.59 704/1189 | |
| pa_low_freq | 14709 | 0.88 1400/1592 | 0.89 2165/2430 | 0.89 1843/2077 | 0.90 3510/3911 | 0.91 3182/3509 | 0.88 1042/1190 | |
| maxgrip | 17929 | 26.0 34.0 46.0 36.1 ± 13.2 |
25.0 32.0 44.0 34.6 ± 12.6 |
27.0 34.0 46.0 36.2 ± 12.8 |
27.0 35.0 47.0 37.2 ± 12.9 |
27.0 35.0 47.0 36.9 ± 12.4 |
28.0 35.0 47.0 37.0 ± 12.4 |
27.0 35.0 46.0 36.7 ± 12.1 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||||||||
The distribution of the numerical variables through Waves are also presented graphically.
4.4 Multivariate description of data
4.4.1 Associations at baseline with structural variables (V1)
Here we explore the associations between explanatory variables measured at baseline and age and sex.
We present the association of age and the categorical explanatory variables plotting the smoothed relationship between age and the value of the variable, stratifying by sex (categorical variables have two categories and are internally coded as 0/1, the smoothed relationship is obtained using the geom_smooth() function, method: gam). The association between education and age and sex was extensively explored in ME1 and therefore it is not presented here.
Vigorous physical activity in men decreases more sharply after 60, the descrease in vigorous PA seems more linear for women. Moderate physical activity remaines stable up to approximately 70 years and decreases sharply afterwords, men and women have similar association between activity and age. There is a low proportion of smokers at older ages, and women have smaller probability of smoking. Weight and height at baseline are negatively associated to age. Beside ageing, this might be due to the cohort effect, which is further explored in LE1.
4.4.2 Independent variables - Correlation (V2)
We explore the correlation between explanatory variables at baseline.
4.4.2.1 Overall correlation at baseline
Females on average have lower values of all the variables, as do older participants. The two types of physical activities are positively correlated, as are height and weight (to a larger extent). Age is negatively associated to all the explanatory variables.
4.4.2.2 Additional explorations
Here we use all the observed data (with repeated measurements) to explore the association between some of the variables, namely height and weight.
Data cleaning performed before data screening removed very low values of height (even if they were considered plausible by the data cleaning performed within SHARE - see details in the import file), but some low values of height might still be due to errors.
The association between the two variables is as expected.
Some very low values of height do not seem consistent with the weight values. For these data points also the BMI is in some cases large.
4.4.3 Interactions between explanatory variables (V3)
The AS envisions the use of interactions between age and all time fixed explanatory variables (sex, education, height), the main interest will be in the interpretation of the interaction between sex and functions of age. The descriptive statistics of all the explanatory variables stratified by age groups and sex are reported in the section V1.
4.4.3.1 Association between age and weight, stratified by physical activity
The possible interaction between age and status of vigorous/low intensity activity with respect to weight is explored here, as it might be of interest to domain experts. The decline of weight with age might be slightly less pronouced among those that perform vigorous physical activity.
Scatter plot by vigorous physical activity and sex
Scatter plot by low physical activity and gender
4.4.4 Stratification (VE1)
We stratify the univariate descriptions of the data by sex and age group first; we explore also the stratification by baseline wave.
4.4.4.1 Baseline measurements stratified by sex
We limit the exploration to baseline measurements (as the wave by wave exploration is conducted on complete data in VE1). The results reported below with tables and graphs indicate the following.
Females and males differed substantially in the distribution of height, weight, vigorous (but not low-intensity) physical activity, and education. Age was similar.
The distribution of grip strength was no longer asymmetric and bimodal, when data were stratified by sex, and it seems appropriate to assume a gaussian distribution; the digit preference was visible despite the automatized method of measurement.
| Baseline characteristics by sex. | |||
| N |
Male N=2583 |
Female N=2869 |
|
|---|---|---|---|
| Wave : Wave 1 | 5452 | 0.29 746/2583 | 0.30 850/2869 |
| Wave 2 | 0.23 603/2583 | 0.24 699/2869 | |
| Wave 4 | 0.08 215/2583 | 0.07 199/2869 | |
| Wave 5 | 0.34 885/2583 | 0.35 998/2869 | |
| Wave 6 | 0.05 134/2583 | 0.04 123/2869 | |
| age_int | 5452 | 53.00 60.00 69.00 62.24 ± 9.92 |
53.00 60.00 70.00 62.77 ± 10.66 |
| age_int_cat : 50-59 | 5452 | 0.48 1230/2583 | 0.47 1346/2869 |
| 60-69 | 0.28 734/2583 | 0.27 768/2869 | |
| 70-80 | 0.18 472/2583 | 0.19 540/2869 | |
| 80+ | 0.06 147/2583 | 0.07 215/2869 | |
| weight | 5361 | 75.0 82.0 92.0 84.0 ± 13.6 |
60.0 68.0 77.0 69.4 ± 13.2 |
| height_imp | 5418 | 173.00 178.00 183.00 178.09 ± 7.07 |
161.00 165.00 170.00 165.41 ± 6.11 |
| education_imp : Low | 5428 | 0.15 397/2569 | 0.28 794/2859 |
| Medium | 0.47 1204/2569 | 0.32 926/2859 | |
| High | 0.38 968/2569 | 0.40 1139/2859 | |
| pa_vig_freq | 5423 | 0.64 1632/2566 | 0.57 1642/2857 |
| pa_low_freq | 5422 | 0.91 2331/2566 | 0.89 2555/2856 |
| cusmoke_imp : Yes | 5423 | 0.27 696/2567 | 0.24 699/2856 |
| maxgrip | 5272 | 40.00 48.00 55.00 47.09 ± 10.28 |
24.00 28.00 33.00 28.02 ± 7.01 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | |||
4.4.4.2 Stratification based on grouped age at baseline and sex
As age is also the time metric from the AS, in this section we explore only some aspects (related to baseline measurement) of the association between age and the other variables, stratifying by sex. More detailed explorations are presented in the sections devoted to time trends. In most analyses participants are grouped in 10 year age groups.
The aim of this analysis is to identify independent variables that might be associated with age and sex.
Among women the association between age and education is stronger (older participants with lower education).
4.4.4.2.1 Females
| Baseline characteristics by age category for females. | |||||
| N |
50-59 N=1346 |
60-69 N=768 |
70-80 N=540 |
80+ N=215 |
|
|---|---|---|---|---|---|
| education_imp : Low | 2859 | 0.15 197/1341 | 0.27 208/ 766 | 0.47 253/ 539 | 0.64 136/ 213 |
| Medium | 0.31 422/1341 | 0.37 283/ 766 | 0.31 166/ 539 | 0.26 55/ 213 | |
| High | 0.54 722/1341 | 0.36 275/ 766 | 0.22 120/ 539 | 0.10 22/ 213 | |
| pa_vig_freq | 2857 | 0.68 908/1344 | 0.59 454/ 765 | 0.44 236/ 538 | 0.21 44/ 210 |
| pa_low_freq | 2856 | 0.93 1253/1344 | 0.93 710/ 765 | 0.85 455/ 537 | 0.65 137/ 210 |
| cusmoke_imp : Yes | 2856 | 0.28 380/1344 | 0.23 177/ 765 | 0.21 112/ 538 | 0.14 30/ 209 |
| weight | 2796 | 62.0 69.0 79.0 71.1 ± 13.3 |
60.0 68.0 76.0 69.4 ± 12.5 |
59.0 65.5 74.0 67.5 ± 13.4 |
55.0 62.0 70.0 63.2 ± 11.3 |
| height_imp | 2848 | 163.00 167.00 171.00 166.94 ± 5.97 |
161.00 165.00 169.00 164.93 ± 5.83 |
160.00 164.00 168.00 163.48 ± 5.67 |
158.00 162.00 167.00 162.13 ± 6.14 |
| maxgrip | 2760 | 28.00 31.00 35.00 31.27 ± 6.11 |
24.00 28.00 31.00 27.50 ± 5.85 |
20.00 24.00 27.00 23.81 ± 5.79 |
15.00 19.00 22.00 18.85 ± 5.29 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | |||||
4.4.4.2.2 Males
| Baseline characteristics by age category for males. | |||||
| N |
50-59 N=1230 |
60-69 N=734 |
70-80 N=472 |
80+ N=147 |
|
|---|---|---|---|---|---|
| education_imp : Low | 2569 | 0.13 153/1222 | 0.14 103/ 732 | 0.21 101/ 470 | 0.28 40/ 145 |
| Medium | 0.47 571/1222 | 0.48 352/ 732 | 0.46 217/ 470 | 0.44 64/ 145 | |
| High | 0.41 498/1222 | 0.38 277/ 732 | 0.32 152/ 470 | 0.28 41/ 145 | |
| pa_vig_freq | 2566 | 0.72 879/1221 | 0.66 480/ 732 | 0.49 229/ 469 | 0.31 44/ 144 |
| pa_low_freq | 2566 | 0.94 1151/1222 | 0.93 680/ 732 | 0.87 405/ 468 | 0.66 95/ 144 |
| cusmoke_imp : Yes | 2567 | 0.31 375/1222 | 0.27 195/ 732 | 0.21 97/ 468 | 0.20 29/ 145 |
| weight | 2565 | 76.0 85.0 95.0 86.3 ± 13.8 |
75.0 83.0 90.0 84.2 ± 13.2 |
71.5 80.0 88.0 80.0 ± 12.3 |
70.0 75.0 80.0 75.9 ± 11.0 |
| height_imp | 2570 | 175.00 180.00 184.00 179.70 ± 6.87 |
173.00 178.00 183.00 178.22 ± 6.89 |
171.00 175.00 179.00 175.16 ± 6.52 |
169.75 173.00 178.00 173.26 ± 6.12 |
| maxgrip | 2512 | 47.00 53.00 58.00 51.95 ± 8.77 |
41.00 47.00 52.00 46.59 ± 8.20 |
34.00 40.00 45.00 39.47 ± 7.94 |
26.00 32.00 37.00 31.60 ± 8.42 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | |||||
4.4.4.3 Stratification by wave of the baseline measurements
The overall summary of baseline measurements over waves is give in the table below (participants can be included in the study at different waves).
| Overall baseline characteristics across waves. | ||||||
| N |
Wave 1 N=1596 |
Wave 2 N=1302 |
Wave 4 N=414 |
Wave 5 N=1883 |
Wave 6 N=257 |
|
|---|---|---|---|---|---|---|
| gender : Female | 5452 | 0.53 850/1596 | 0.54 699/1302 | 0.48 199/ 414 | 0.53 998/1883 | 0.48 123/ 257 |
| age_int | 5452 | 56.00 62.00 72.00 64.40 ± 10.58 |
54.00 61.00 70.00 62.77 ± 10.01 |
51.00 52.00 54.00 53.24 ± 4.04 |
56.00 63.00 71.00 63.98 ± 10.02 |
51.00 52.00 52.00 53.78 ± 6.43 |
| age_int_cat : 50-59 | 5452 | 0.40 644/1596 | 0.44 576/1302 | 0.94 390/ 414 | 0.39 737/1883 | 0.89 229/ 257 |
| 60-69 | 0.29 455/1596 | 0.30 393/1302 | 0.05 19/ 414 | 0.33 617/1883 | 0.07 18/ 257 | |
| 70-80 | 0.22 353/1596 | 0.19 250/1302 | 0.01 4/ 414 | 0.21 398/1883 | 0.03 7/ 257 | |
| 80+ | 0.09 144/1596 | 0.06 83/1302 | 0.00 1/ 414 | 0.07 131/1883 | 0.01 3/ 257 | |
| weight | 5361 | 65.0 74.0 84.0 74.7 ± 14.6 |
65.0 75.0 85.0 75.5 ± 14.3 |
68.0 78.0 90.0 80.1 ± 16.9 |
65.0 75.0 86.0 76.9 ± 15.6 |
70.0 80.0 90.0 80.8 ± 15.4 |
| height_imp | 5418 | 164.00 170.00 177.00 170.46 ± 8.96 |
165.00 170.00 177.00 171.05 ± 9.01 |
167.00 174.00 180.00 173.94 ± 8.86 |
165.00 171.00 178.00 171.52 ± 9.20 |
168.00 174.00 181.00 174.46 ± 9.37 |
| education_imp : Low | 5428 | 0.26 406/1586 | 0.22 284/1293 | 0.10 42/ 410 | 0.22 422/1882 | 0.14 37/ 257 |
| Medium | 0.44 696/1586 | 0.38 494/1293 | 0.35 143/ 410 | 0.37 698/1882 | 0.39 99/ 257 | |
| High | 0.31 484/1586 | 0.40 515/1293 | 0.55 225/ 410 | 0.40 762/1882 | 0.47 121/ 257 | |
| pa_vig_freq | 5423 | 0.60 956/1591 | 0.50 636/1284 | 0.67 278/ 413 | 0.65 1215/1879 | 0.74 189/ 256 |
| pa_low_freq | 5422 | 0.88 1400/1592 | 0.89 1143/1284 | 0.94 388/ 413 | 0.92 1719/1877 | 0.92 236/ 256 |
| cusmoke_imp : Yes | 5423 | 0.31 499/1590 | 0.28 356/1284 | 0.26 106/ 413 | 0.20 380/1879 | 0.21 54/ 257 |
| maxgrip | 5272 | 26.0 34.0 46.0 36.1 ± 13.2 |
25.0 33.0 44.0 35.0 ± 12.6 |
33.0 43.0 55.0 43.7 ± 13.0 |
28.0 35.0 47.0 37.1 ± 12.2 |
33.0 40.0 54.8 43.0 ± 12.5 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||||||
Due to the study design, that envisioned the inclusion of refreshment samples limited to the younger age groups, the participants first included in Wave 4 or 6 differed from those included in the other waves: the were substantially younger and were more frequently males - the differences in age and gender should explain the difference in the other variables: higher education, higher values for weight, height, grip strength.
The numerical variables are summarized also graphically.
4.5 Longitudinal aspects
4.5.1 Outcome variable - Profiles (L1)
Here the aim is to visualize the individual profiles of the outcome for the participants.
The number of subjects is very large and profile plots of grip strength are not clearly conveying the information about individual variability. To visualize effectively the profile plots we use different strategies: we use selected subgroups of participants (100 per group, stratifying the plots by sex and age groups), and different time metrics (age or measurement occasion). Interactive plots are also available (see the separate output page devoted to interactive plots).
4.5.1.1 Age as time metric
Overall, the profile plots highlight the trend towards diminishing grip strength with age and the rate of change seems to accelerate over age (the slope at later ages is bigger than at the beginning). Older participants are followed up for shorter times, substantial increases or decreases in grip strength bewtween measurements are possible. The variability of the outcome tends to decrease at later measurement occasions, especially in the older age groups.
All profiles
Subsets of profiles
Here we display the profiles of approximately 400 individuals for each sex group.
4.5.1.2 Measurement occasion as time metric
Here we show the profile plots by measurement occasion. In our case study the plots based on measurement occasion and stratified by age group are more informative than those based on age, as participants enter the study at different ages. Even though age was included as a continuous time metric in the analysis strategy, a summary stratified by ten-year groups can serve as a quick overview of the longitudinal trends by age. The plots based on subsets are more easily interpretable also with this time metric.
4.5.1.3 Subsets of profile plots
Profile plots of grip strength, choosing 100 subjects for each age/sex category with the baseline value of grip strength at a certain quantile of the distribution (100 quantiles 0.00001 to 1, by 0.01). The plot includes only subjects with at least three valid measurements (can be changed). This type of plot substitutes the classical profile plot in this application,
We also show the profiles of the participants with complete follow-up (7 measurements)
4.5.2 Trends of the outcome variable (L2)
This section shows the distribution of grip strength at baseline and at each wave and measurement occasion, aggregating the data rather than focusing on the individual profiles.
4.5.2.0.1 Distribution of the outcome by Wave (stratified by sex)
Here we further explore the distribution of the outcome by wave, stratifying by sex.
Males
| Distribution of grip strength by Wave for males. | ||||||||
| N |
Wave 1 N=746 |
Wave 2 N=1157 |
Wave 3 N=910 |
Wave 4 N=990 |
Wave 5 N=1848 |
Wave 6 N=1656 |
Wave 7 N=1421 |
|
|---|---|---|---|---|---|---|---|---|
| maxgrip | 8484 | 40.00 47.00 54.00 46.47 ± 10.42 |
38.00 45.00 52.00 44.45 ± 10.16 |
40.00 47.00 53.00 46.48 ± 9.97 |
40.25 48.00 55.00 47.26 ± 10.29 |
40.00 47.00 54.00 46.60 ± 9.91 |
40.00 47.00 54.00 46.78 ± 9.69 |
40.00 46.00 53.00 46.30 ± 9.29 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||||||||
Females
| Distribution of grip strength by Wave for females. | ||||||||
| N |
Wave 1 N=850 |
Wave 2 N=1330 |
Wave 3 N=1069 |
Wave 4 N=1122 |
Wave 5 N=2071 |
Wave 6 N=1858 |
Wave 7 N=1604 |
|
|---|---|---|---|---|---|---|---|---|
| maxgrip | 9445 | 22.00 27.00 32.00 26.87 ± 7.26 |
21.00 26.00 30.00 25.77 ± 6.81 |
23.00 27.00 32.00 27.19 ± 6.88 |
24.00 28.00 33.00 28.09 ± 6.76 |
24.00 28.00 33.00 28.10 ± 6.52 |
24.00 28.00 33.00 28.25 ± 6.64 |
24.00 28.00 32.00 27.94 ± 6.29 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||||||||
4.5.2.1 Distribution of outcome by measurement occasion (stratified by sex, histograms and boxplots)
Average grip strength declines with measurement occasions, the average decline is larger for men than for women; also the variability and sample size decreases at later measurement occasions.
Males
| Distribution of grip strength by measurement occasion for males. | ||||||||
| N |
1 N=2583 |
2 N=1983 |
3 N=1562 |
4 N=940 |
5 N=720 |
6 N=646 |
7 N=294 |
|
|---|---|---|---|---|---|---|---|---|
| maxgrip | 8484 | 40.00 48.00 55.00 47.09 ± 10.28 |
41.00 48.00 54.00 46.95 ± 9.85 |
40.00 47.00 53.00 46.52 ± 9.78 |
40.00 47.00 53.00 46.21 ± 9.66 |
39.00 45.00 52.00 44.86 ± 9.81 |
38.00 45.00 50.00 44.07 ± 9.28 |
38.00 44.00 50.00 43.86 ± 9.05 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||||||||
Females
| Distribution of grip strength by measurement occasion for females. | ||||||||
| N |
1 N=2869 |
2 N=2228 |
3 N=1801 |
4 N=1059 |
5 N=861 |
6 N=748 |
7 N=338 |
|
|---|---|---|---|---|---|---|---|---|
| maxgrip | 9445 | 24.00 28.00 33.00 28.02 ± 7.01 |
24.00 28.00 32.00 27.77 ± 6.92 |
24.00 28.00 32.00 27.91 ± 6.52 |
24.00 28.00 32.00 27.52 ± 6.45 |
23.00 27.00 31.00 26.79 ± 6.43 |
22.00 27.00 30.00 26.52 ± 6.18 |
21.00 25.00 30.00 25.33 ± 5.92 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||||||||
4.5.2.2 Distribution of outcome by age time metric (stratified by sex, histograms and boxplots)
Grip strength decreased with age. Individual variations are displayed with profiles plots later in the report(domain V7); the variability of the outcome at different ages is also explored later (domain V8).
4.5.3 Outcome variable - Correlation and variability (L3)
Here we used complete pairs of observations and use Pearson correlation to quantify the correlation between measurements taken in different waves/at different measurement occasions.
The following explorations evaluate the correlations using waves, measurement occasions, time since baseline, and age as time metrics. These explorations can be useful for determining the characteristics of the outcome based on different time metrics. Using waves we can identify some systematic errors due to wave, while measurement occasion/age is more directly related to the research question (decline of grip strength in time/with age).
The variability of the outcome at different ages is explored only for age as a time metric.
4.5.3.1 Wave as time metric
The correlations between subsequent measurements is very large (about 0.90) and decreases slightly for larger time differerences.
The large correlations are driven by the separation of the values of males and females. Below are shown the correlation matrices for males and females, separately, and the scatterplots of the measurements.
The correlations are slightly lower for females compared to males. It is interesting to note that the decrease in measurements taken further apart decreases more substantially, if sexes are analyzed separately.
Matrix with correlations (above the diagonal), SD (on the diagonal) and covariances (under the diagonal), males
| Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 | Wave 6 | Wave 7 | |
|---|---|---|---|---|---|---|---|
| Wave 1 | 10.4 | 0.82 | 0.81 | 0.79 | 0.74 | 0.74 | 0.71 |
| Wave 2 | 80.8 | 10.16 | 0.84 | 0.84 | 0.79 | 0.77 | 0.75 |
| Wave 3 | 78.7 | 79.39 | 9.97 | 0.88 | 0.83 | 0.80 | 0.79 |
| Wave 4 | 73.3 | 77.13 | 85.30 | 10.29 | 0.86 | 0.84 | 0.84 |
| Wave 5 | 69.7 | 67.94 | 75.49 | 83.67 | 9.91 | 0.87 | 0.85 |
| Wave 6 | 64.2 | 64.78 | 68.31 | 77.01 | 77.18 | 9.69 | 0.87 |
| Wave 7 | 57.7 | 57.11 | 62.50 | 70.22 | 69.34 | 72.31 | 9.29 |
Matrix with correlations (above the diagonal), SD (on the diagonal) and covariances (under the diagonal), females
| Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 | Wave 6 | Wave 7 | |
|---|---|---|---|---|---|---|---|
| Wave 1 | 7.26 | 0.76 | 0.73 | 0.73 | 0.73 | 0.66 | 0.59 |
| Wave 2 | 35.01 | 6.81 | 0.80 | 0.80 | 0.79 | 0.75 | 0.72 |
| Wave 3 | 33.12 | 35.65 | 6.88 | 0.81 | 0.78 | 0.75 | 0.72 |
| Wave 4 | 31.54 | 33.31 | 34.68 | 6.76 | 0.85 | 0.82 | 0.78 |
| Wave 5 | 29.07 | 31.06 | 31.37 | 34.68 | 6.52 | 0.82 | 0.81 |
| Wave 6 | 25.77 | 30.18 | 30.80 | 33.75 | 33.45 | 6.64 | 0.82 |
| Wave 7 | 22.19 | 26.97 | 27.12 | 29.78 | 30.37 | 32.40 | 6.29 |
Generalized pairs plot
4.5.3.2 Measurement occassion as time metric
Correlation matrix
Matrix with correlations (above the diagonal), SD (on the diagonal) and covariances (under the diagonal)
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| M1 | 12.9 | 0.92 | 0.92 | 0.91 | 0.89 | 0.88 | 0.87 |
| M2 | 148.2 | 12.77 | 0.93 | 0.92 | 0.91 | 0.90 | 0.90 |
| M3 | 142.9 | 143.15 | 12.40 | 0.94 | 0.93 | 0.92 | 0.91 |
| M4 | 143.9 | 146.48 | 146.41 | 12.38 | 0.94 | 0.93 | 0.93 |
| M5 | 135.3 | 138.36 | 137.43 | 137.07 | 12.15 | 0.94 | 0.94 |
| M6 | 130.3 | 133.87 | 131.54 | 131.04 | 132.94 | 11.72 | 0.94 |
| M7 | 135.2 | 137.11 | 138.77 | 135.46 | 137.06 | 134.60 | 11.96 |
Separate for males and females
The variance decreased with measurement occasion, the correlations decreased with larger time lags.
Males
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| M1 | 10.3 | 0.83 | 0.82 | 0.79 | 0.74 | 0.73 | 0.71 |
| M2 | 78.8 | 9.85 | 0.86 | 0.83 | 0.80 | 0.78 | 0.76 |
| M3 | 75.9 | 78.06 | 9.78 | 0.87 | 0.84 | 0.81 | 0.82 |
| M4 | 71.1 | 75.19 | 82.14 | 9.66 | 0.87 | 0.83 | 0.85 |
| M5 | 66.0 | 70.77 | 77.77 | 78.94 | 9.81 | 0.86 | 0.87 |
| M6 | 59.6 | 64.57 | 66.46 | 65.86 | 70.82 | 9.28 | 0.89 |
| M7 | 57.7 | 59.17 | 64.78 | 67.01 | 71.27 | 72.79 | 9.05 |
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| M1 | 7.01 | 0.78 | 0.78 | 0.77 | 0.74 | 0.71 | 0.59 |
| M2 | 36.06 | 6.92 | 0.81 | 0.79 | 0.75 | 0.74 | 0.65 |
| M3 | 33.93 | 34.21 | 6.52 | 0.84 | 0.80 | 0.78 | 0.69 |
| M4 | 34.18 | 33.73 | 34.99 | 6.45 | 0.84 | 0.80 | 0.70 |
| M5 | 30.68 | 30.67 | 31.56 | 31.29 | 6.43 | 0.85 | 0.78 |
| M6 | 27.92 | 28.56 | 28.90 | 28.14 | 32.61 | 6.18 | 0.78 |
| M7 | 22.19 | 21.45 | 23.11 | 21.59 | 24.64 | 26.18 | 5.92 |
4.5.3.3 Age as time metric (two-year groups, from 50 years old)
Data were grouped in two year categories to obtain bigger groups. Two years were used as the difference between waves is usually two years and consecutive measurements at individual level are usually taken each two years. Only estimates based on at least 20 observations are shown.
4.5.3.4 Correlations
Note that here we use age_int, which is an interger value
The correlations between grip strength measured at consecutive ages is very large but it decreases with age for far apart measurements. Measurements for younger participants are more correlated than for older participants,
The large correlations also here are driven by the separation of the values of males and females,
As an example, see below the scatterplots of the values of grip strength for the individuals aged between 50 and 59 (grouped in two-year categories), by sex. The within-sex correlations are much weaker than the overall correlations.
Below are the complete correlations matrices separately for the two sexes and the boxplots of the correlations. Note that the estimates appear less stable, as they are based on smaller groups. Only estimates based on more than 20 observations are displayed.
Also in the separate analyses the correlations appear to diminish as the age difference increases
The correlations are displayed also with an alternative graphical display, that makes easier the numerical comparisons.
The estimated correlations for large lags appear veary variable, especially for males and for the oldest participants. The correlations decrease at larger lags.
4.5.3.5 Variability
The graph previously displayed in the longitudinal trends (L2) domain can be used also to assess how the variability of the measurements varies with age - for example, to identify possible problems with the hypothesis of constant variance.
The graph below show in a single graph the average, standard deviation and coefficient of variation of the outcome, grouping the participants in two-year groups. The SD decreases with age, as does the mean, while the CV increaes.
Note that these graphs are produced using all the longitudinal data, the findings based on the baseline data are similar (trends in SD are less visible at older ages - due to smaller sample sizes?)
4.5.4 Trends of time-varying explanatory variables (L4)
Here we describe how the independent variables vary wave, across measurement occasions and age. Age is the time metric chosen in the AS, measurement occasion is used to summarize the time since inclusion in the study.
4.5.4.1 Stratification by wave
Here we summarize the variables across waves (including baseline and longitudinal interviews, we include also the variables that do not change across time, as education and smoking at baseline to give an overall comparison of the participation across waves.)
| Overall characteristics across waves. | ||||||||
| N |
Wave 1 N=1596 |
Wave 2 N=2487 |
Wave 3 N=1979 |
Wave 4 N=2112 |
Wave 5 N=3919 |
Wave 6 N=3514 |
Wave 7 N=3025 |
|
|---|---|---|---|---|---|---|---|---|
| gender : Female | 18632 | 0.53 850/1596 | 0.53 1330/2487 | 0.54 1069/1979 | 0.53 1122/2112 | 0.53 2071/3919 | 0.53 1858/3514 | 0.53 1604/3025 |
| age_int | 18632 | 56.00 62.00 72.00 64.40 ± 10.58 |
56.00 63.00 72.00 64.53 ± 10.30 |
58.00 64.00 73.00 65.79 ± 9.93 |
57.00 64.00 72.00 65.11 ± 10.53 |
57.00 64.00 72.00 65.39 ± 10.08 |
58.00 65.00 72.00 65.77 ± 10.03 |
60.00 66.00 73.00 67.23 ± 9.52 |
| age_int_cat : 50-59 | 18632 | 0.40 644/1596 | 0.38 946/2487 | 0.32 641/1979 | 0.36 754/2112 | 0.34 1317/3919 | 0.32 1121/3514 | 0.25 748/3025 |
| 60-69 | 0.29 455/1596 | 0.31 783/2487 | 0.35 687/1979 | 0.33 707/2112 | 0.35 1376/3919 | 0.35 1231/3514 | 0.37 1122/3025 | |
| 70-80 | 0.22 353/1596 | 0.22 538/2487 | 0.23 448/1979 | 0.21 438/2112 | 0.22 859/3919 | 0.23 820/3514 | 0.28 845/3025 | |
| 80+ | 0.09 144/1596 | 0.09 220/2487 | 0.10 203/1979 | 0.10 213/2112 | 0.09 367/3919 | 0.10 342/3514 | 0.10 310/3025 | |
| weight | 16356 | 65.0 74.0 84.0 74.7 ± 14.6 |
65.0 75.0 85.0 75.5 ± 14.7 |
65.0 75.0 85.0 76.4 ± 15.3 |
65.0 75.0 85.0 76.6 ± 15.4 |
65.0 76.0 86.5 77.2 ± 15.8 |
66.0 76.0 87.0 77.8 ± 15.9 |
|
| education_imp : Low | 18570 | 0.26 406/1586 | 0.23 566/2472 | 0.22 429/1972 | 0.19 390/2105 | 0.20 769/3908 | 0.18 627/3507 | 0.17 510/3020 |
| Medium | 0.44 696/1586 | 0.41 1012/2472 | 0.40 798/1972 | 0.40 843/2105 | 0.39 1510/3908 | 0.39 1368/3507 | 0.38 1153/3020 | |
| High | 0.31 484/1586 | 0.36 894/2472 | 0.38 745/1972 | 0.41 872/2105 | 0.42 1629/3908 | 0.43 1512/3507 | 0.45 1357/3020 | |
| pa_vig_freq | 14709 | 0.60 956/1591 | 0.50 1226/2430 | 0.53 1097/2077 | 0.61 2405/3913 | 0.62 2178/3509 | 0.59 704/1189 | |
| pa_low_freq | 14709 | 0.88 1400/1592 | 0.89 2165/2430 | 0.89 1843/2077 | 0.90 3510/3911 | 0.91 3182/3509 | 0.88 1042/1190 | |
| maxgrip | 17929 | 26.0 34.0 46.0 36.1 ± 13.2 |
25.0 32.0 44.0 34.6 ± 12.6 |
27.0 34.0 46.0 36.2 ± 12.8 |
27.0 35.0 47.0 37.2 ± 12.9 |
27.0 35.0 47.0 36.9 ± 12.4 |
28.0 35.0 47.0 37.0 ± 12.4 |
27.0 35.0 46.0 36.7 ± 12.1 |
| a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. | ||||||||
Age increased accross waves, despite the presence of refreshement samples, as did the level of education. The proportion of females remained rather stable.
4.5.4.2 Graphical display of time-varying variables across waves
Only numerical variables are displayed graphically.
4.5.4.3 Stratification based on measurement occasion
4.5.4.3.1 Physical activity
In the following summaries we consider only interviews where PA was not missing by design. Missing data are not reported (as they are rare if not missing by design, and were reported in the missing data section).
As expected, the proportion of participants that report vigorous or low intensity physical activity slightly declines with measurement occasion and, more substantially, with age.
Interestingly, individuals do not necessarily always decrease their amount of physical activity, as shown in parallel plots. For vigourous physical activity it is more likely to transition from active (1) to non-active (0) than the opposite, while the opposite is true for low intensity physical activity.
4.5.4.3.2 Vigorous physical activity
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| All: 0 | 2149 | 1204 | 490 | 792 | 699 | 567 | 242 |
| All: 1 | 3274 | 1763 | 626 | 906 | 881 | 792 | 324 |
| All: prop 1 | 0.60 | 0.59 | 0.56 | 0.53 | 0.56 | 0.58 | 0.57 |
| Males: 0 | 934 | 517 | 224 | 326 | 285 | 246 | 98 |
| Males: 1 | 1632 | 886 | 299 | 459 | 434 | 381 | 159 |
| Males: prop 1 | 0.64 | 0.63 | 0.57 | 0.58 | 0.60 | 0.61 | 0.62 |
| Females: 0 | 1215 | 687 | 266 | 466 | 414 | 321 | 144 |
| Females: 1 | 1642 | 877 | 327 | 447 | 447 | 411 | 165 |
| Females: prop 1 | 0.57 | 0.56 | 0.55 | 0.49 | 0.52 | 0.56 | 0.53 |
Age as time metric
We display a graphical summary of the association between age at interview and vigorous physical activity, all the data are used (the same participants contribute more than one measurement).
Vigorous physical activity decreases sharply for participants 65 or older
4.5.4.3.3 Low intensity physical activity
| M1 | M2 | M3 | M4 | M5 | M6 | M7 | |
|---|---|---|---|---|---|---|---|
| All: 0 | 536 | 259 | 108 | 214 | 208 | 166 | 76 |
| All: 1 | 4886 | 2708 | 1008 | 1484 | 1372 | 1194 | 490 |
| All: prop 1 | 0.90 | 0.91 | 0.90 | 0.87 | 0.87 | 0.88 | 0.87 |
| Males: 0 | 235 | 110 | 51 | 88 | 84 | 75 | 28 |
| Males: 1 | 2331 | 1293 | 472 | 697 | 635 | 553 | 229 |
| Males: prop 1 | 0.91 | 0.92 | 0.90 | 0.89 | 0.88 | 0.88 | 0.89 |
| Females: 0 | 301 | 149 | 57 | 126 | 124 | 91 | 48 |
| Females: 1 | 2555 | 1415 | 536 | 787 | 737 | 641 | 261 |
| Females: prop 1 | 0.89 | 0.90 | 0.90 | 0.86 | 0.86 | 0.88 | 0.84 |
We further explore the possible effect of birth cohort in LE1.
4.5.5 Evaluation of possible age-cohort effects (LE1)
The following graphs show the smoothed association between age and grip strength, evaluated using baseline measurements (blue), all longitudinal data (f), age-cohort trajectories (red lines, grouping participants in 5 year groups based on their age at baseline). The graphs are shown separately for men and women.
4.5.5.1 Overall description
Here we define the birth cohort variable that will be used to explore the possible presence of cohort effects in some of the characteristics of the participants. Participants are grouped in 10 year groups, except for the older cohort (including 19 years because of small sample size). There is a strong association between age and birth cohort due to the design of the study. The association is present analyzing all data (first graph) or just the first interview (second graph)
4.5.5.2 Association of birth cohort with outcome
The following graphs show the smoothed association between age and grip strength, evaluated using baseline measurements (black solid line), all longitudinal data (black dashed line), year-of-birth-cohort trajectories (colored lines described in the legend, grouping participants in 5 year groups based on their year of birth, larger grouping is used for extreme years where less participants were included). The graphs are shown separately for men and women.
There is a clear birth cohort effect
In a similar way we also explored the longitudinal age effect grouping the participants that belonged to the same age group, defined in 5-year groups.
4.5.5.3 Association of birth cohort and physical activity
When the summaries of physical activity is stratified by birth cohort we observe that there is not much decline with age for the younger cohorts, while the decline is very steep for the oldest cohort (that is including all the oldest participants). The cohorts differ in their engagement in vigorous PA. Among women the effect is different.
When the summary is stratified by birth cohort we observe that there is not much decline with age for the younger participants (belonging to younger birth cohorts), while the decline is very steep for the oldest cohort (that is including all the oldest participants). The cohorts differ in their engagement in vigorous PA. The smoothed estimates by cohort much more variable for women.
Explanation for the effect for women???
4.6 Output for data analysis
Outputted datasets
Data in long format: share1_withflags - ** add later **
The matrices with the information about the missing value structure are in wide format, each column indicates a Wave.
df.missing_cv (by Wave, wide format)
Codes: -999: not yet included in the study; 1: interview available; -10 missing interview; -1000: out of sample/missing by design.
Additionally, the df.missing_5cat_cv data set distinguishes lost to follow up and intermittent missingness (1: interview done, -10 lost to FU, -11: intermittent missingness, -12: lost to follow up, -100: death, -1000 out of sample missing by design, -999: not yet included in the study) and df.missing_6cat_cv data set distinguishes lost to follow up and intermittent missingness (1: interview done, -10 lost to FU, -11: intermittent missingness, -12: lost to follow up, -100: death, -1000 out of sample, -999: not yet included in the study, -1001: missing by design) - separates out of sample and missing by design.
death.status.waves is the matrix with the Death/Alive/Unknown indication (Wave, wide format ). NAs in the measurement occasion matrix indicate that the measurement is not obtained as the study ended.
Complete coverscreen information (wide format) is also available and could be exported: cv.all